Paul van Oordt SQL Services
Freelance SQL Server specialist and troubleshooter, Utrecht / Antwerp
SQL Server offers various possibilities to compress data. This can come in handy, especially for historical data, to limit the growth of data files and backup times. Below I briefly describe the options. And since some time I again teach longer SQL Server training courses, in collaboration with Eduvision.
If you don't want to receive my newsletter, let me know.
Paul van Oordt
SQL Server includes several options to compress data in order to reduce storage, I/O and memory usage.
Column-store indexes use an advanced form of compression that compresses entire datasets of up to a million rows. However, column-store indexes are intended for a specific workload, namely reporting on large data sets, and are usually not an alternative to regular indexes. A clustered column-store index with archival compression is the most efficient way to archive data that is rarely accessed.
Compression of ordinary b-tree indexes and heaps is set per table or index partition, allowing you to compress, for example, only the historical data in a table. The current data, which is still regularly modified, can then remain uncompressed. Therefore, to compress data, first table partitioning needs to be configured.
Compression of the data in a partition can be done at row level and at page level. How much space is saved with either option depends on the actual data. Row-level compression compresses fixed-length columns based on their content. For example, a 0 in an integer field takes up only 1 bit and the numbers 1 - 255 only take up 1 byte of space, instead of the standard 4 bytes for an int or 8 bytes for a bigint. Fixed length character data (char and nchar), datetime and some other data types may also yield savings. Page-level compression includes row-level compression and additionally uses algorithms to efficiently store multiple byte sequences on a page, even within variable length columns.
The estimated effect of compression can be queried with a stored procedure, as follows:
exec sp_estimate_data_compression_savings 'Sales', 'Invoices', null, null, 'page';
The fourth and fifth arguments of the procedure are index_id and partition_id respectively. For this table in the WideWorldImporters sample database, row compression yields 11% reduction and page compression 29%. The compressed data is therefore 89% and 71% of the uncompressed data.
And column-level compression, is that still available too? It is. T-SQL contains the
decompress functions that can compress character data. Again, the profit depends on the data. The reduction will be greater when the string is longer and simpler. A test on a table with SQL statements and query plans yields significant gains of over 80% for either.
Select performance doesn't suffer too much from compression. The internal format of the data storage is different, but it can still be read efficiently. Compression has more impact on data modifications. Updating a 0 into a 1 in an integer column, for example, takes an extra byte, and that just needs to be available. Large-scale updates can easily lead to many page splits and therefore take much more time. It is therefore logical to use compression mainly for data that is rarely changed, and is perhaps only queried occasionally.
It is clear that a case to case assessment on for a specific dataset and workload is needed. In many cases, this also means thinking about partitioning. Table partitioning has other important advantages such as removing large amounts of data as a super-fast metadata operation, and rebuilding indexes or updating statistics per partition. Partitioned views, sometimes called 'poor man's partitioning' when table partitioning was still an Enterprise Edition feature, can also be well combined with compression. In a partitioned view, data from different tables with
union all is combined in one view. An advantage is that it can be put into use with minimal downtime.
In addition to the compression of data discussed here, there is also backup compression that can be enabled very simply.
As every teacher knows, a course is not only educational for the students. The preparation and the many questions also give the trainer more and more knowledge and a deeper understanding.
Knowledge transfer has an important place in all my work. After all, I'm only with a customer for a short time. Besides that, I sometimes provide an improvised course of one or two days, usually resulting from an investigation into performance problems. "Maybe we should spend a day with our developers in a room..."
For the first time since I worked as a freelance trainer for Compu'Train (1999 - 2009), I done a more comprehensive training on almost all aspects of SQL Server: programming, management, performance, availability, security and internals. Both the students and I enjoyed it.
I don't see myself in the position to organize such courses, but luckily there are companies that are very good at that. I collaborate with Eduvision in Apeldoorn. I do not provide Microsoft Official Courses, and I am not a Microsoft Certified Trainer (anymore), but I can fulfill learning requirements in a very flexibly manner based on a lot of practical experience.
Thanks for reading.