Tuesday, October 18, 2016

Columnstore Indexes in 2 minutes

The standard way of storing relational database tables on physical media is row based, that is, each row occupies a contiguous space. The term that is used for this type of data storage is rowstore.

In simple terms, you can think of the columnstore as a transposed rowstore. Logically nothing changes, you can still think of and see a table as a normal table with rows and columns, but physically the data is stored in a column-wise format.

Why do this and when? 

  • Think of a column like “Country” on say an “Orders” table – you basically have a handful of country IDs repeating millions of times. Just imagine the kind of data compression you can achieve on such column! 
  • Now think of a query like “get sales by country” – instead of scanning the whole table, SQL Server will only need to deal with two compressed columns and will be able to return the results many times faster using significantly less resources. 
  • When? Storing fact tables and large dimension tables in data warehouses as clustered columnstore indexes will significantly improve compression and query performance.
Why not do this?
If columnstore indexes are so great why not store tables as clustered columnstore indexes always? Just imagine what an insert|update|delete looks like in the case of a clustered columnstore index for a table that say has just 20 columns! It is kind of like doing 20 separate inserts|updates|deletes, one for each column. So, on a transactional database columnstore indexes are not a good idea.

Best of both worlds?
SQL Server 2016 lets us create an updatable non-clustered columnstore index on a rowstore table and non-clustered rowstore indexes on clustered columnstore indexes. This mixing comes with a cost in both cases but in certain scenarios the gains achieved make this worthwhile.

Our Schema and Data Compare tools support comparing and synchronizing columnstore indexes.

0 comments:

Post a Comment