SQL Server 2019 is SQL for big data
Blog|by Mary Branscombe|7 November 2018

The big news with the SQL Server 2017 release was support for running on Linux and in containers, graph queries, and running machine learning where your data is using R and Python. A year later, the CTP 2.0 preview of SQL Server 2019 announced at the Ignite conference also goes beyond the familiar relational database, with a new architecture that combines the SQL Server database engine, Apache Spark and Hadoop distributed file system (HDFS) support as a hybrid platform for big data, so you can connect to relational, NoSQL and big data sources and work with them all in a unified way.
This gives SQL Server a distributed architecture, where you can pick and mix the elements that best suit your data needs. The Spark engine is now part of SQL Server, so you can combine SQL compute nodes with either SQL or HDFS storage nodes depending on whether you need relational tables or a data lake, using Spark for data science, advanced analytics and machine learning tasks, and have SQL Server and Spark running in the same Kubernetes container deployment.
With SQL Server 2016, the Docker container support has sometimes been seen more as a way to speed up deployment initially for development and test; but the Kubernetes support in SQL Server 2019 is much broader, supporting the features needed for production deployments.
These new SQL Server Big Data Clusters create an elastic scale-out data virtualisation platform where you can deploy both SQL and Spark Linux containers on Kubernetes, including deploying SQL Server Availability Groups in Kubernetes. To do that, you first deploy an operator role into Kubernetes that orchestrates the deployment of pods, connects to them and then orchestrates the full deployment of the availability group onto that pod deployment – which allows for rolling upgrades to apply updates with less downtime. If you need to provide a quorum within the availability group, Microsoft is working on an open source Paxos implementation (which will be available on GitHub) that will provide a similar architecture to failover cluster instances. The big data clusters are in a limited public preview; you have to register and request access.
Whether you’re moving to the new big data clusters or sticking to a conventional SQL Server architecture, Polybase still gives you more connectivity in the 2019 release. Polybase still supports Spark, Hortonworks and Cloudera Hadoop, but there are new connectors to query Oracle, Teradata and MongoDB (including Cosmos DB), as well as generic ODBC data sources (like DB2, SAP HANA and even Excel) and even other SQL Server databases directly from SQL Server without needing to move or replicate it, making it much faster to generate reports that need information from external tables.
The integrated security tier in SQL Server covers the Spark and HDFS integration, protecting data at rest and in motion with the Always Encrypted option (which requires secure enclaves on your servers but now allows more complex operations), plus built-in data discovery, classification and auditing (which can now log the sensitivity classification of data returned in a query) across all data stores. SQL Server Configuration Manager now includes certificate management for deploying certificates to failover clusters and Always On Availability Groups, and viewing installed certificates (including a handy view showing certificates that will expire soon).

SQL Server 2019 at a glance. Source: Microsoft
Intelligent Query Processing takes the automated performance tuning of Adaptive Query Processing in SQL Server 2017 further, building on the performance tuning that’s done in SQL Azure. Choose the new 150 database compatibility level to have your query performance automatically tuned either at runtime, or based on analysing past performance. Adjusting the memory used for a query based on past performance has already worked for batch execution with columnstore indexes; now it’s available for all queries, and batch execution works for row stores. Lightweight query profiling is now turned on by default, so you can look back and understand query performance or look at live queries running for troubleshooting without needing to turn on extra data collection for diagnostics.
The core SQL Server engine gets some updates too, including UTF-8 encoding support (which could save a significant amount of storage) and online index build and rebuild when you convert row-store tables into the clustered columnstores useful for analytics (previously, the database had to be paused while the clustered columnstore index was created but now you can carry on working against the database while the conversion happens). Developers will be able to pause and resume the creation of an online index rather than starting from the beginning if it’s interrupted (or the database runs out of space). That can be set as the default for a database, if necessary.
There are enhancements to the graph queries introduced in SQL Server 2017; you can specify graph relationships in a single statement rather than needing separate insert, update and delete statements. You can also insert a new edge or update a merge between two nodes with a single statement using the new match options in a merge statement. By default, edge tables can connect any two nodes in the database; edge constraints allow you to limit the type of nodes an edge table can connect to.
SQL Server for Linux 2019
Now that HDFS is supported natively in SQL Server, developers can bring data from multiple sources for machine learning data model training and operationalise that model in a single system. If T-SQL doesn’t have all the features needed, Java joins Python and R as languages that you can execute in-place inside SQL Server.
The Machine Learning Services component that supports Java, Python and R now runs on SQL Server on Linux, not just Windows Server, giving developers a much wider choice of languages and environments. However, on Windows Server, Machine Learning Services can now be deployed into failover clusters for availability.
SQL Server for Linux catches up on a few other missing features in SQL Server 2019: notably replication (transactional, snapshot or merge) and distributed transactions (with support for the Microsoft Distributed Transaction Coordinator (MSDTC)). New OpenLDAP support allows third-party AD providers to join your domain, simplifying access management.
As persistent (or storage class) memory like Intel Optane starts to arrive in production systems, it’s ideal for in-memory databases. SQL Server on Windows has supported persistent memory since SQL Server 2016 and in SQL Server 2019 on Windows Server 2019 database objects can be stored on persistent memory using standard block-based storage. But in SQL Server for Linux you can now bypass the Linux storage stack and access persistent memory devices directly to get lower-latency IO with SQL database files, transaction logs and in-memory OLTP checkpoint files.
If you’re looking for the latest version of SQL Operations Studio to work with SQL Server across multiple platform and manage all these options, the name is changing to Azure Data Studio, because it’s becoming more modular and supports multiple data sources – including SQL Server 2019 as well as SQL Azure. It also has a notebook experience for running Query books. There’s also a preview of SQL Server Management Studio 18.0 for configuring and administering SQL Server components, so the tools are there to help you try out the previews of SQL Server 2019.
Grey Matter has a team of SQL licensing specialists who can help you with your database questions. They’re available to call: +44 (0)1364 654100 or email: [email protected]
Contact Grey Matter
If you have any questions or want some extra information, complete the form below and one of the team will be in touch ASAP. If you have a specific use case, please let us know and we'll help you find the right solution faster.
By submitting this form you are agreeing to our Privacy Policy and Website Terms of Use.
Author
Mary Branscombe
Independent Writing and Editing Professional at Freelance
Mary is a technology journalist with over three decades of experience, regularly contributing to CIO.com, The New Stack, Computer Weekly, The Stack, AskWoody and more. She specialises in Microsoft, authoring two books on Windows 8 and one O'Reilly book on Azure AI services with writing partner, Simon Bisson.
Related News
Adobe Creative Cloud for Teams is retiring for Education
Adobe Creative Cloud for Teams retires for higher education on 31 March 2026.
This is your chance to modernise creativity, collaboration, and compliance across campus. Explore Creative Cloud Pro Plus (Enterprise) and plan a smooth migration that empowers teaching, learning, and research.
We’re exhibiting at Cloud & AI Infrastructure London 2026
4 - 5 March 2026 9:00 am - 5:00 pm GMT
We’re excited to be heading back to London for this year’s Cloud & AI Infrastructure London 2026. Make sure to grab your ticket and stop by our stand D212. We’d love to hear your updates on what’s new in your organisation – the plans and challenges you’re facing in 2026. ...
Cyber security in 2025: What we learned and how to stay ahead in 2026
Why human factors, AI, and compliance will define the next era of cyber defence 2025 was a turning point for cyber security. From headline-grabbing breaches to the rise of AI-driven attacks, businesses faced unprecedented challenges. So, what can you learn – and how can...
Mapping as a security function
Mapping has become a core part of modern security. Azure Maps and HERE give organisations the privacy, governance and geospatial insight they need to protect data, strengthen threat awareness and secure day-to-day operations.