Wednesday, June 6, 2012

SQL Merge Replication beware of identity columns

You have set up SQL Server merge replication between your primary site and your DR (disaster recovery) site – this allows you to use the DR site as an overflow site when you have volume spikes on the primary site and the bottleneck is not SQL Server. It is a pretty cool setup indeed with SQL Server seamlessly taking care of keeping both databases in sync with each other. You have further tested the failover to the DR site multiple times and everything worked flawlessly, so you assure the management that you have implemented a “fool-proof” system and as far as SQL Server is concerned any failure on the primary site will be transparent to the end users.

Sooner or later, as it inevitably does, disaster strikes and your primary site is completely out of commission.  As planned and expected the DR site takes over and everything is working fine, at first… a couple of hours into this disaster a mission critical application starts failing. You start digging into your logs and see this “unique constraint violation” error which does not seem to make sense and the first thing that comes to mind is that the developers must have screwed up somewhere and you might need to revert to an older version of that application. In the meantime the operations have come to a complete halt and you don’t know where to hide.

So what caused this? No, it wasn’t the developers that messed up but rather the root cause of this is the way SQL Server Merge Replication handles identity columns. You can read this Microsoft article for a more complete overview http://msdn.microsoft.com/en-us/library/ms152543.aspx but in short to avoid “clashing” of the IDs between primary and secondary servers the primary server is in charge of assigning blocks of IDs to the secondary server that do not overlap with the IDs the primary is using and making sure that the secondary always has IDs available. But, if the primary server is not available for an extended period of time the secondary might exhaust all the available IDs after which any inserts to that table will fail.

There are multiple ways you can deal with this like managing the ID blocs so that sufficiently large blocks are assigned to the secondary; monitoring the usage of those IDs on the secondary and stopping the replication altogether if necessary, avoiding the use of the identity columns as primary keys for tables etc. However, the purpose of this brief article is to make you aware of this potential problem that you might face when using SQL server merge replication for disaster recovery.

Don't forget to check out our SQL Server Data Compare and Synchronization tool: http://www.xsql.com/products/sql_server_data_compare/ - it is one of the most useful sql server tools and it also comes in a free lite edition.