SQL Server 2019 is SQL for big data
Blog|by Mary Branscombe|7 November 2018
The big news with the SQL Server 2017 release was support for running on Linux and in containers, graph queries, and running machine learning where your data is using R and Python. A year later, the CTP 2.0 preview of SQL Server 2019 announced at the Ignite conference also goes beyond the familiar relational database, with a new architecture that combines the SQL Server database engine, Apache Spark and Hadoop distributed file system (HDFS) support as a hybrid platform for big data, so you can connect to relational, NoSQL and big data sources and work with them all in a unified way.
This gives SQL Server a distributed architecture, where you can pick and mix the elements that best suit your data needs. The Spark engine is now part of SQL Server, so you can combine SQL compute nodes with either SQL or HDFS storage nodes depending on whether you need relational tables or a data lake, using Spark for data science, advanced analytics and machine learning tasks, and have SQL Server and Spark running in the same Kubernetes container deployment.
With SQL Server 2016, the Docker container support has sometimes been seen more as a way to speed up deployment initially for development and test; but the Kubernetes support in SQL Server 2019 is much broader, supporting the features needed for production deployments.
These new SQL Server Big Data Clusters create an elastic scale-out data virtualisation platform where you can deploy both SQL and Spark Linux containers on Kubernetes, including deploying SQL Server Availability Groups in Kubernetes. To do that, you first deploy an operator role into Kubernetes that orchestrates the deployment of pods, connects to them and then orchestrates the full deployment of the availability group onto that pod deployment – which allows for rolling upgrades to apply updates with less downtime. If you need to provide a quorum within the availability group, Microsoft is working on an open source Paxos implementation (which will be available on GitHub) that will provide a similar architecture to failover cluster instances. The big data clusters are in a limited public preview; you have to register and request access.
Whether you’re moving to the new big data clusters or sticking to a conventional SQL Server architecture, Polybase still gives you more connectivity in the 2019 release. Polybase still supports Spark, Hortonworks and Cloudera Hadoop, but there are new connectors to query Oracle, Teradata and MongoDB (including Cosmos DB), as well as generic ODBC data sources (like DB2, SAP HANA and even Excel) and even other SQL Server databases directly from SQL Server without needing to move or replicate it, making it much faster to generate reports that need information from external tables.
The integrated security tier in SQL Server covers the Spark and HDFS integration, protecting data at rest and in motion with the Always Encrypted option (which requires secure enclaves on your servers but now allows more complex operations), plus built-in data discovery, classification and auditing (which can now log the sensitivity classification of data returned in a query) across all data stores. SQL Server Configuration Manager now includes certificate management for deploying certificates to failover clusters and Always On Availability Groups, and viewing installed certificates (including a handy view showing certificates that will expire soon).
Intelligent Query Processing takes the automated performance tuning of Adaptive Query Processing in SQL Server 2017 further, building on the performance tuning that’s done in SQL Azure. Choose the new 150 database compatibility level to have your query performance automatically tuned either at runtime, or based on analysing past performance. Adjusting the memory used for a query based on past performance has already worked for batch execution with columnstore indexes; now it’s available for all queries, and batch execution works for row stores. Lightweight query profiling is now turned on by default, so you can look back and understand query performance or look at live queries running for troubleshooting without needing to turn on extra data collection for diagnostics.
The core SQL Server engine gets some updates too, including UTF-8 encoding support (which could save a significant amount of storage) and online index build and rebuild when you convert row-store tables into the clustered columnstores useful for analytics (previously, the database had to be paused while the clustered columnstore index was created but now you can carry on working against the database while the conversion happens). Developers will be able to pause and resume the creation of an online index rather than starting from the beginning if it’s interrupted (or the database runs out of space). That can be set as the default for a database, if necessary.
There are enhancements to the graph queries introduced in SQL Server 2017; you can specify graph relationships in a single statement rather than needing separate insert, update and delete statements. You can also insert a new edge or update a merge between two nodes with a single statement using the new match options in a merge statement. By default, edge tables can connect any two nodes in the database; edge constraints allow you to limit the type of nodes an edge table can connect to.
SQL Server for Linux 2019
Now that HDFS is supported natively in SQL Server, developers can bring data from multiple sources for machine learning data model training and operationalise that model in a single system. If T-SQL doesn’t have all the features needed, Java joins Python and R as languages that you can execute in-place inside SQL Server.
The Machine Learning Services component that supports Java, Python and R now runs on SQL Server on Linux, not just Windows Server, giving developers a much wider choice of languages and environments. However, on Windows Server, Machine Learning Services can now be deployed into failover clusters for availability.
SQL Server for Linux catches up on a few other missing features in SQL Server 2019: notably replication (transactional, snapshot or merge) and distributed transactions (with support for the Microsoft Distributed Transaction Coordinator (MSDTC)). New OpenLDAP support allows third-party AD providers to join your domain, simplifying access management.
As persistent (or storage class) memory like Intel Optane starts to arrive in production systems, it’s ideal for in-memory databases. SQL Server on Windows has supported persistent memory since SQL Server 2016 and in SQL Server 2019 on Windows Server 2019 database objects can be stored on persistent memory using standard block-based storage. But in SQL Server for Linux you can now bypass the Linux storage stack and access persistent memory devices directly to get lower-latency IO with SQL database files, transaction logs and in-memory OLTP checkpoint files.
If you’re looking for the latest version of SQL Operations Studio to work with SQL Server across multiple platform and manage all these options, the name is changing to Azure Data Studio, because it’s becoming more modular and supports multiple data sources – including SQL Server 2019 as well as SQL Azure. It also has a notebook experience for running Query books. There’s also a preview of SQL Server Management Studio 18.0 for configuring and administering SQL Server components, so the tools are there to help you try out the previews of SQL Server 2019.
Grey Matter has a team of SQL licensing specialists who can help you with your database questions. They’re available to call: +44 (0)1364 654100 or email: email@example.com
Contact Grey Matter
If you have any questions or want some extra information, complete the form below and one of the team will be in touch ASAP. If you have a specific use case, please let us know and we'll help you find the right solution faster.
Mary Branscombe is a freelance tech journalist. Mary has been a technology writer for nearly two decades, covering everything from early versions of Windows and Office to the first smartphones, the arrival of the web and most things in between.
In today’s data-driven world, businesses face the critical challenge of ensuring the safety and accessibility of their ever-expanding volumes of data. As organizations increasingly turn to cloud solutions for data storage and backup, the combination of Veeam and Wasabi emerges...
In the fast-paced and interconnected world of business, data has become the lifeblood of organisations. The ability to securely store, manage, and recover data is crucial for business continuity and success. Acronis, a global leader in cyber protection, offers a...
Elevate your development capabilities with RAD Studio 12 Architect, the most advanced RAD Studio edition with the broadest feature set. RAD Studio 12 Architect can enable enterprise-level data services and integrations with Embarcadero’s homegrown tools and services, like Aqua Data...
Intel upgrade options Intel® Parallel Studio XE users, please consider upgrading to Intel® oneAPI Base & HPC Toolkit to take advantage of the latest features including functional and security updates. Intel® System Studio and Intel® oneAPI Base & IOT Toolkit...