Monday, October 22, 2012

Comparing large databases – xSQL Data Compare

The task of comparing the data in two databases seems fairly simple and straight forward until you start considering the size of the databases involved. To understand why let's present a common scenario and see what it takes to handle this seemingly simple task.
Let's say you have database DB1 on server S1 that is hosted on a data center somewhere on the west coast and a version of the same database DB1 on server S2 hosted on a data center somewhere on the east coast. Let's further say that DB1 contains approximately 100 GB of data and you need to:
  • Determine what's different in those two databases
  • Selectively push changes from the west coast database to the east coast one.
Lastly, let's say that you, the database master reside somewhere in Atlanta and have a nice setup with a powerful machine and high bandwidth connection to both servers.
 
On a default scenario here is what will happen:
  • You will read about 100GB of data from server S1 and another 100GB of data from server S2 
  • You compare the data and record the differences
  • You need a way to efficiently display those differences so that you can review them
  • You need to generate a change script that depending on the number of differences may get to be very large
  • Finally you need to execute the script against the target database
So what's the problem you say? 
  • Even with a blazing fast connection it will take a long time to transfer 200GB worth of data from those servers to your machine
  • While you probably have sufficient disk space you can't bring much of that data into memory so how do you compare two tables with millions of rows each when only a small portion of those rows will fit in memory
  • What do you do with let's say a 10 GB change script? How do you execute that against the target database?
There are many data compare tools, especially SQL Server data compare tools in the market today but, unfortunately, (or maybe fortunately for us) most of them will simply fail at this task – some of them will fail very quickly while some other may take a while to crash.
Our SQL Server Data Compare and its "twin sister" Oracle Data Compare are the very rare exception: if you have sufficient disk space those tools will not "choke" no matter how large the database is. Those tools will:
  • efficiently process the large data sets while tightly controlling the memory usage
  • Display the data differences on an easy to ready interface that will seamlessly page through the large datasets
  • Successfully execute the script in configurable “chunks” no matter how large the script may be
While SQL Server Data Compare and Oracle Data Compare will handle any size database it will still take a very long time if the databases are large therefore we have introduced to extremely useful features to make the process more efficient:
  • Row filters: if you know, as is often the case, that let's say on a given table any rows that have a modifieddate earlier than a given date have not changed and do not need to be compared then you add a row filter on that table pair and instead of comparing for example 10 million rows you only compare 100K rows – now that I a huge reduction on the workload! 
  • Column exclusions: SQL Server Data Compare and Oracle Data Compare allow the user to choose which columns on a given table to exclude from the comparison and synchronization. For example, a table that contains a text column might have 5GB worth of data on it but if you exclude that one text column the table only has 100MB of data – again a huge reduction on the workload. 
  • Lastly, you only need to go through this configuration ones – SQL Server Data Compare  and Oracle Data Compare will remember every single comparison configuration detail so next time you can launch the comparison with a single click.
Download the free, fully functional trial today and see for yourself

Friday, October 12, 2012

t-sql list of countries

The script which you can download from the link below creates a "countries" table and populates it with the complete ISO list inlcuding the Numerical Code of the country, the name of the country both in lower case letters and upper case letters, the ISO ALPHA-3 code and the ISO 3166-1-alpha-2 code.

A couple of notes:
  1. the Countries table created by the script does not have a Primary Key defined on it - you will need to define one.
  2. The following territories are listed on the ISO site but they are not listed on the UN site so I did not include them in the script.
UNITED STATES MINOR OUTLYING ISLANDS, UM
TAIWAN, PROVINCE OF CHINA, TW
HEARD ISLAND AND MCDONALD ISLANDS, HM
SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS, GS
VIRGIN ISLANDS, BRITISH, VG
FRENCH SOUTHERN TERRITORIES, TF
CHRISTMAS ISLAND, CX
COCOS (KEELING) ISLANDS, CC
BOUVET ISLAND, BV
ANTARCTICA, AQ


You can download the script from here: http://www.xsql.com/download/Scripts/Countries.zip

What are Sequence Numbers in SQL Server 2012

If you have worked with SQL Server for a while you either have built yourself or at least seen somewhere some sort of a GetNextValue(MyTable) function that returns the next available sequential ID for the given table. The way it worked was: you would have a small table with two columns (TableName, NextID) and the above function would read the corresponding ID from this table and increment the NextID column. Why was this necessary when you could use the auto incremented Identity column? A couple of reasons: you needed the ID before you inserted the row in the table; you wanted to be able to possibly re-use those IDs' and, you wanted to use the ID as a foreign key on dependent tables. You made this solution work and it was ok however, you had to handle concurrency (multiple processes requesting new IDs for the same table) and this process often became a bottleneck causing unexpected performance issues.

The uniqueidentifier introduced in SQL Server 2005 was great but often that is not what you want, instead of a 50 character long unique identifier you want a sequential number.

SQL Server 2012 addresses all those problems with a new user-defined schema-bound object called sequence that generates a sequence of numeric values according to a set of specifications. Here is how it works:
  • first, you create a sequence that you intend to use for a certain purpose as
         CREATE SEQUENCE MySchema.MyFirstSequence
            AS int  /* can be any built in integer type or a user defined integer type */
            START WITH 1
            INCREMENT BY 1 ;
  • Whenever we need to grab the next number in the sequence we simply refer to this object as NEXT VALUE FOR MySchema.MyFirstSequence
Here are a couple of examples of using the sequence object we created above:
 
Example 1: INSERT INTO MyTable (ID, SomeValue) VALUES (NEXT VALUE FOR MySchema.MyFirstSequence, 'some value')

Example2: Let's say we need to create a record in MyTable and also a couple of dependent records on MyDependentTable where the MyTable.ID participates as a foreign key. In that case we need to first grab the next ID and then use it in our insert statements, so we would do something like this:
DECLARE @nextID int
SET @nextID = NEXT VALUE FOR MySchema.MyFirstSequence

 INSERT INTO MyTable (ID, SomeValue) VALUES (@nextID, 'some value')
 INSERT INTO MyDependentTable (MyTableID, OtherColum) VALUES (@nextID, 'other value 1')
 INSERT INTO MyDependentTable (MyTableID, OtherColum) VALUES (@nextID, 'other value 1')


Here are a few things to note:
  • you don't have to worry about concurrency, SQL Server takes care of that
  • sql server also takes care of the performance as well - unlike your home build "next sequential value" mechanism the sequence object is unlikely to ever cause any issues for you. 
  • you can set min and max values for the sequence and you can cycle through the values in that range
  • important: the sequence numbers are generated outside the scope of the current transaction and they are "consumed" even if the transaction that requested the next value is rolled back
For a thorough explanation of the sequence object, the syntax for creating, dropping and altering a sequence, as well as examples of using the sequence object you can check out those two msdn articles: http://msdn.microsoft.com/en-us/library/ff878058.aspx  and http://msdn.microsoft.com/en-us/library/ff878091.aspx

Note: the next version of our schema compare tool, xSQL Object, that will be released next month will include support for the sequence object as well.

Thursday, October 11, 2012

What is SQL Server Database Engine Tunning Advisor?

SQL Server Database Engine Tuning Advisor helps you select and create an optimal set of indexes, indexed views, and partitions without requiring an expert understanding of the structure of the database or the internals of SQL Server.
 
SQL Server Database Engine Tuning Advisor analyzes a workload and the physical implementation of one or more SQL Server databases. A workload is a set of Transact-SQL statements that execute against a database or databases that you want to tune. Database Engine Tuning Advisor uses trace files, trace tables, or Transact-SQL scripts as workload input when tuning databases. You can create Transact-SQL script workloads with the Query Editor in SQL Server Management Studio. You can create trace file and trace table workloads by using the Tuning Template in SQL Server Profiler.
 
SQL Server Database Engine Tuning Advisor analyzes the workload and can recommend that you add, remove, or modify physical design structures in your databases. The advisor can also recommend what statistics should be collected to back up physical design structures. The physical design structures include clustered indexes, nonclustered indexes, indexed views, and partitioning. SQL Server Database Engine Tuning Advisor recommends a set of physical design structures that reduces the query optimizers estimated cost of the workload.

SQL Server Database Engine Tunning Advisor Capabilities

 Following is a summary of what SQL Server Tunning Advisor can do for you:
  • Recommend the best mix of indexes for databases by using the query optimizer to analyze queries in a workload.
  • Recommend aligned or non-aligned partitions for databases referenced in a workload.
  • Recommend indexed views for databases referenced in a workload.
  • Analyze the effects of the proposed changes, including index usage, query distribution among tables, and query performance in the workload.
  • Recommend ways to tune the database for a small set of problem queries.
  • Allow you to customize the recommendation by specifying advanced options such as disk space constraints.
  • Provide reports that summarize the effects of implementing the recommendations for a given workload.
  • Consider alternatives in which you supply possible design choices in the form of hypothetical configurations for Database Engine Tuning Advisor to evaluate.
SQL Server Database Engine Tunning Advisor Features

SQL Server Database Engine Tuning Advisor provides new features that enable both novice and experienced SQL Server database administrators to tune databases for better query performance. The following sections list and describe the Database Engine Tuning Advisor new features:

  • Improved Workload Parsing - handles batches that reference transient tables such as temp tables;
  • parses and tunes queries referencing user-defined functions; tunes statements in triggers and a lot more.
  • Enhanced Scalability - uses workload compression, which allows SQL Server Database Engine Tuning Advisor to reduce the amount of time spent tuning while producing high quality tuning recommendation results; uses an algorithm to avoid redundant statistics creation, which reduces the number of I/Os incurred during tuning.
  • Integrated Tuning - SQL Server Database Engine Tuning Advisor can consider the trade offs in performance that are offered by various different types of physical design structures (indexes, indexed views, partitioning). 
  • Multiple Database Tuning - applications often access multiple databases to complete their work, so workloads frequently refer to objects from multiple SQL Server databases. SQL Server Database Engine Tuning Advisor can simultaneously tune multiple databases.
  • Offload of Tuning Overhead to Test Server - tuning a large workload can create significant overhead on the server that is being tuned. This occurs because SQL Server Database Engine Tuning Advisor often needs to make several calls to the query optimizer during the tuning process. Using a test server in addition to your production server eliminates this problem.
  • Command-Prompt Utility and Graphical User Interface Tool - SQL Server Database Tuning Engine Advisor provides a dta command-prompt utility to make it easy to incorporate SQL Server Database Engine Tuning Advisor functionality with scripting and also provides a graphical user interface (GUI) tool, which makes it easy to view tuning sessions and results.
  • Drop-Only Tuning - physical design structures may accumulate over time in a SQL Server database, and database administrators need a way to drop those structures that are no longer useful. The drop-only tuning option is useful for identifying existing physical design structures that are either not used at all or have very little benefit for a given workload.
  • XML Support - all of the tuning operations that you can perform with the SQL Server Database Engine Tuning Advisor GUI and the dta command-prompt utility can be defined in a Database Engine Tuning Advisor XML input file. 
  • User-specified Configuration and "What-if" Analysis Support - SQL Server Database Engine Tuning Advisor enables users to provide a hypothetical configuration of physical design structures (indexes, indexed views, and partitioning strategies) as tuning input. You can then tune a database or databases as if the configuration were already implemented.
  • Analysis Reports - SQL Server Database Engine Tuning Advisor generates several analysis reports in text or in XML after a tuning session completes. These reports provide information such as the cost of queries occurring in the workload, the frequency of events that occur in the workload, or the relationships between queries and the indexes they reference.
  • Tuning Sessions - SQL Server Database Engine Tuning Advisor prompts you to create a unique name or number to identity the tuning session before the analysis can begin. After analyzing a workload or when analysis is stopped, SQL Server Database Engine Tuning Advisor saves the tuning session information and associates it with the identifier you supplied before tuning began.
 Related Tools: If you need to trace multiple SQL Servers on schedule of if you wish to run a profiler trace on SQL Server Express or MSDE you can download the free (free for one SQL Server instance only) xSQL Profiler. xSQL Profiler allows you to configure granular traces and schedule those traces to run on multiple servers at certain time intervals and it automatically aggregates all the trace data from those servers into a central repository database.  Download your free copy from: http://www.xsql.com/download/package.aspx?packageid=12

Wednesday, October 10, 2012

To automate or not to automate

For every routine task we do there is a tool out there that would help reduce the time and effort required to complete the task as well as ensure consistent and more accurate outcome. So why do we so often choose to forgo the opportunity for improvement? Here are a few reasons that don't need much analysis:
  • Job security – the innate fear that automation will render our position as obsolete.
  • Resistance to change – we do things this way because we have always done them this way; or, why try to fix something that is not broken.
  • Being always under pressure and adapting a short term thinking – right now I need to take care of this task, I don’t have time to spend on researching how to do it faster and better.
  • Underestimating the cumulative effect of manual tasks –a task that only takes let's say 15 minutes a day to complete does not “scream” I need automation BUT in one year you will end up spending about 10 full days of work on that task. 
  • Overestimating the cost of automation - it will take too much time and money to acquire and implement a solution. 
None of those reasons constitute a good, rational argument to forgo automation. On the other hand there is the other extreme, automating for the sake of automating. The decision on whether to automate a process / task or not should be based on a rigorous and objective method that is based on numbers and not on feelings.
 
To devise such method we need to first answer the question: what will the decision be based on? While there are various scenarios the most common one is that in which the decision will be based purely on financial considerations, that is, will the automation pay off? Will we get a return on the investment?  Here is the list of factors that you will need to consider in order to answer this question:
  • Estimated value of "labor" -> VT ($/hour)
  • Estimated time to complete task -> TT (hours)
  • Estimated frequency of task -> TF (#times / month)
  • Estimated time required to identify the right tool for the job -> RT (hours)
  • Estimated time required for implementing the automated solution -> AT (hours)
  • Estimated initial cost for acquiring the tool -> TIC ($)
  • Estimated cost for maintaining the tool -> TMC ($ / year)
  • Estimated time required to complete the task after automation ->TTAA
So the question is: how much time will it take to break even? Or, stated differently, how long will it take for this automation to pay for itself?

Let t denote the number of months it will take to reach this "magic" break-even point after which we start making a positive return on our investment. The break-even point will be reached when: 
VT*TT*TF *t = RT*VT + TIC + TMC/12 * t + AT * VT + TTAA*VT*t 
 
So, consequently, the break-even time can be calculated as:
 t = (RT*VT + TIC + AT*VT) / (VT*TT*TF - TMC/12 – TTAA*VT)
 
Let's use this on a real life example. Assume you are a DBA and your weekly routine includes deploying tens of scripts into your large server farm and let’s say that this takes one full hour of your time. Let’s further assume that based on your compensation, benefits, infrastructure etc. your cost to the company is approximately $70 / hour. You estimate that it will take you about 2 hours to identify a solution and another 2 hours to implement the automated solution in your environment. After the automation the one hour task will be reduced down to 6 minutes of monitoring and reviewing the results. As for the estimated costs of the solution let’s use some real numbers – the best solution in the market for this scenario is Script Executor which costs $249 for the license plus $50 / year for the upgrade subscription. So, we have:
  • VT = $70 / hour
  • TT = 1 hour
  • TF = 4 times / month
  • RT = 2 hours
  • AT = 1 hours
  • TIC = $249
  • TMC = $50 / year
  • TTAA = 6 minutes = 0.1 hours
Break even: t = (2 * 70 + 249 + 1 * 70) / (70 * 1 * 4 – 50 / 12 – 0.1*70) => t ~ 1.7 months
 Note that if TMC/12 + TTAA*VT  >=  VT*TT*TF then the investment will never pay off in pure financial terms so, unless there are other important considerations then this will be a no go.
 
So, what is a good t? That depends, but generally, anything under 6 months is usually a no brainer. However, as there are always other potential investments competing for the same budget you will need to compare the parameters of this with those of the other potential investments to make the right decision.
 
You can download a fully functional free trial of Script Executor from:  http://www.xsql.com/download/package.aspx?packageid=40 
 

Monday, October 8, 2012

Bidirectional synchronization using xSQL Data Compare

xSQL Data Compare is designed to compare the data in two SQL Server databases and generate the change script to make database A the same as database B or vice versa - in other words it is designed to go in only one direction at a time. However, while this one directional synchronization is very helpful in various scenarios it does not address the need of the users who are in charge of managing distributed databases and that for one reason or another are not able to implement a SQL replication solution. While not an ideal solution, with a bit of effort those users can use xSQL Data Compare to address this issue.
So, let's say you have a copy of database DB1 on Server S1 and a copy of it on server S2. Both servers/databases are active so they are both being modified simultaneously. Now, let’s say that you are in charge of synchronizing those two copies of the database every night. Before you can synchronize let's say table T1 on S1.DB1 with table T1 on S2.DB1 you will need to make a couple of structural changes:
  1. First let's assume that table T1 has a TimeStamp column that will be used to decide which record overrides which.
  2. Now, add column S2_TimeStamp on table S1.T1 and add column S1_TimeStamp on table S2.T1. You will need to make sure that the new timestamp columns are automatically updates anytime the row is modified. 
  3. Create a view for S1.T1 and a view for S2.T1 naming the timestamp columns in such a way that the name of S1.T1.TimeStamp matches S2.T1.S1_TimeStamp and S2.T1.TimeStamp matches S1.T1.S2_TimeStamp. The view on S1 will contain the following columns: T1.ID, T1.TimeStamp as TimeStamp1, T1.S2_TimeStamp as TimeStamp2 whereas the view on S2 will contain: T1.ID, T1.TimeStamp as TimeStamp2, T1.S1_TimeStamp as TimeStamp1. This will ensure that the columns of S1.T1 are automatically mapped with columns of S2.T1.
You will only need to do the above once for each table you wish to do a bidirectional data synchronization for. At this point everything is ready to set up the comparison and synchronization. Here are the steps:
  1. First, run a compare of the two views you created above and then:
    1. Exclude wT1.TimeStamp2 column and generate the script that makes S2.wT1 the same as S1.wT1. Execute the script – this will update the S2.T1 table with the timestamp from S1 without touching the S2 timestamp. 
    2. Now include wT1.TimeStamp2 and exclude wT1.TimeStamp1 and generate the script that makes S1.wT1 the same as S2.wT1. Execute the script – this will update the S1.T1 table with the timestamp from S1 without touching the S1 timestamp.
  2. Now you are ready for the final step – compare table S1.T1 with S2.T1 adding a WHERE clause for each side to exclude the rows where the timestamps are the same. You will need to repeat this process twice: first, you only include rows where S1.T1.TimeStamp > S1.T1.S2_TimeStamp – you will need to "push" those rows from S1 to S2; then, you include only the rows where S2.T1.TimeStamp > S2.T1.S1_TimeStamp – you will need to push those rows from S2 to S1.
 You can use the SQL Data Compare command line utility to automate all those comparison steps.
 
xSQL Data Compare is free for SQL Express and also for other editions of SQL Server with some limitations (check the details here: http://www.xsql.com/products/sql_server_data_compare/editions.aspx).

Thursday, October 4, 2012

Query Multiple Tables that may not exist using TRY CATCH to avoid errors

I had the priveledge of working on a document management system recently.  Unfortunately, for whatever reason, there were new tables created for each year.  For example, documents imported into the database in 2011 were in table Documents_2011.  There were over 200 databases, and to make matters worse, not all databases contained every year.  If the client started in 2008, there were tables starting from 2008.  If they started in 2011, the first table would be 2011.  After running a few thoughts in my head, I decided to try the new SQL TRY CATCH block.

I first found the earliest year that was used.  In this case it was 2002.  Start by creating a temp table.  Next, use a WHILE loop and start at the first year possible.  Dynamically create an INSERT INTO SELECT statement.  Execute the statement.  Add 1 to the year and continue looping until the current year.

Since this database required scripts to be run at the beginning of each year, this stored procedure would not have to change since it dynamically queries all tables from START year to CURRENT year.

CREATE PROCEDURE SelectDocumentsToBeArchived
 @RetentionPeriod INT
AS

DECLARE @YEAR INT = 2002
DECLARE @SQL VARCHAR(8000) = ''

DECLARE @CUTOFFDATE DATE
SET NOCOUNT ON
SET @CUTOFFDATE = DATEADD(MONTH, -@RetentionPeriod, GETDATE())
CREATE TABLE  #tmp_ToBeArchived (DocID BIGINT,[FileName] VARCHAR(256))

WHILE @YEAR <= YEAR(GETDATE())
BEGIN
 BEGIN TRY

  SET @SQL = 'INSERT INTO #tmp_ToBeArchived SELECT DocID, [FileName]'
  SET @SQL = @SQL + ' FROM DOCUMENTS_' + CAST(@YEAR AS VARCHAR(4)) + ' DOCUMENTS'
  SET @SQL = @SQL + ' WHERE DOCUMENTS.DateImported < ''' + CAST(@CUTOFFDATE AS VARCHAR(10)) + ''''
  SET @SQL = @SQL + ' AND ISNULL(DOCUMENTS.Deleted,0) = 0'

  --PRINT @SQL
  EXEC(@SQL)

 END TRY
 BEGIN CATCH
  PRINT @SQL
  PRINT ERROR_MESSAGE()
 END CATCH

 SET @YEAR = @YEAR + 1
END
;

SET NOCOUNT OFF
SELECT #tmp_ToBeArchived.* FROM #tmp_ToBeArchived
 ORDER BY DocID

DROP TABLE #tmp_ToBeArchived
GO