Monday, October 22, 2012

Comparing large databases – xSQL Data Compare

The task of comparing the data in two databases seems fairly simple and straight forward until you start considering the size of the databases involved. To understand why let's present a common scenario and see what it takes to handle this seemingly simple task.
Let's say you have database DB1 on server S1 that is hosted on a data center somewhere on the west coast and a version of the same database DB1 on server S2 hosted on a data center somewhere on the east coast. Let's further say that DB1 contains approximately 100 GB of data and you need to:
  • Determine what's different in those two databases
  • Selectively push changes from the west coast database to the east coast one.
Lastly, let's say that you, the database master reside somewhere in Atlanta and have a nice setup with a powerful machine and high bandwidth connection to both servers.
 
On a default scenario here is what will happen:
  • You will read about 100GB of data from server S1 and another 100GB of data from server S2 
  • You compare the data and record the differences
  • You need a way to efficiently display those differences so that you can review them
  • You need to generate a change script that depending on the number of differences may get to be very large
  • Finally you need to execute the script against the target database
So what's the problem you say? 
  • Even with a blazing fast connection it will take a long time to transfer 200GB worth of data from those servers to your machine
  • While you probably have sufficient disk space you can't bring much of that data into memory so how do you compare two tables with millions of rows each when only a small portion of those rows will fit in memory
  • What do you do with let's say a 10 GB change script? How do you execute that against the target database?
There are many data compare tools, especially SQL Server data compare tools in the market today but, unfortunately, (or maybe fortunately for us) most of them will simply fail at this task – some of them will fail very quickly while some other may take a while to crash.
Our SQL Server Data Compare and its "twin sister" Oracle Data Compare are the very rare exception: if you have sufficient disk space those tools will not "choke" no matter how large the database is. Those tools will:
  • efficiently process the large data sets while tightly controlling the memory usage
  • Display the data differences on an easy to ready interface that will seamlessly page through the large datasets
  • Successfully execute the script in configurable “chunks” no matter how large the script may be
While SQL Server Data Compare and Oracle Data Compare will handle any size database it will still take a very long time if the databases are large therefore we have introduced to extremely useful features to make the process more efficient:
  • Row filters: if you know, as is often the case, that let's say on a given table any rows that have a modifieddate earlier than a given date have not changed and do not need to be compared then you add a row filter on that table pair and instead of comparing for example 10 million rows you only compare 100K rows – now that I a huge reduction on the workload! 
  • Column exclusions: SQL Server Data Compare and Oracle Data Compare allow the user to choose which columns on a given table to exclude from the comparison and synchronization. For example, a table that contains a text column might have 5GB worth of data on it but if you exclude that one text column the table only has 100MB of data – again a huge reduction on the workload. 
  • Lastly, you only need to go through this configuration ones – SQL Server Data Compare  and Oracle Data Compare will remember every single comparison configuration detail so next time you can launch the comparison with a single click.
Download the free, fully functional trial today and see for yourself

0 comments:

Post a Comment