Thursday, February 2, 2012

The danger of subqueries on a T-SQL delete statement

How can a sub-query wreak havoc on your data? Consider this: you are doing some clean up on your database and need to delete from table [t1] all rows the id of which happens to refer to rows on a table [t2] that match a certain criteria - let's say all rows that were created before a given date. Without thinking twice you go ahead and write:

      delete from [t1]
        where id in (select id from [t2] where "some criteria")


You expect a few records will be deleted from table [t1] - you click on execute... you can't believe your eyes, SQL Server Management Studio is reporting that 200 thousand rows were deleted, oh no, the whole content of table [t1] was wiped out! How could this be? You are panicking... you realize you broke your own rules but first you want to understand how could this innocent query cause this. You examine it closer and you realize that there is no column named id on table [t2] - but shouldn't that have caused the sub-query, and consequently the whole query to fail?!

Well, it didn't, did it? Here is why: if you try running the sub-query by itself you will see that it will fail (assuming that there is no column named id on table [t2]), but when that sub-query is part of the bigger query things behave a bit differently. The reference id in the sub-query will be resolved against table [t1] in which case regardless of what the subquery criteria is, it will always return the id of the row being processed from table [t1] thus all rows from table [t1] will be deleted.

Here are a few simple rules that anyone working with data should follow religiously:
  1. Always qualify the column names - had you done that your query would have failed and you would have been safe;
  2. Never execute a delete statement without wrapping it in a transaction so that you can roll-back if you realize that you screwed up;
  3. Write the query as a SELECT query first - execute it, see how many and which rows will be effected and only when you are certain that the query returns the rows you want then replace the "select * from" with "delete from"
Lastly, whenever you mess around with production data make sure your backups are good and keep SQL Server Data Compare handy as it will enable you to selectively restore only what you need without overwriting everything else.

0 comments:

Post a Comment