Tuesday, February 21, 2012

Instead of a LIKE search

Here is a scenario you might face: let's say you have a table called "customers" that among other things has a column called "AccountNumber" that is defined as a varchar(20) column. Since searching by account number is a very common query, to improve the performance of it and guarantee a faster response time for your users you have decided to create an index on the  AccountNumber column. The situation is complicated a bit by the fact that those account numbers come from multiple sources and in some cases the number may be padded with zeroes in front depending on the source. You know that the max length for the account number including padding zeroes is 20 characters but the number of padding zeroes varies. In the meantime, when your users do a search they may include padding zeroes or may not - either way your query should find the row(s) the user is looking for. Easy enough, you can write something like:
 SELECT * FROM customers
 WHERE udf_trimZeroes(accountnumber) = udf_trimZeroes(@accountnumber)

or something like:
 SELECT * FROM customers
 WHERE accountnumber like ‘%’+ udf_trimZeroes(@accountnumber)

Either one of those queries will produce the results you are seeking, however, in both cases, albeit for different reasons, the optimizer will choose to do a clustered index scan which would be very expensive and result in a slow response time. You have already added an index on AccountNumber but the optimizer is ignoring that! So, what can you do? Here is a simple solution that will significantly improve this query:
First: we take the @accountnumber parameter that is passed to our stored procedure and build a small table variable containing all the possible values of this account number by padding the number with leading zeros up to the max length of 20 characters.
  DECLARE @AccountNumber_Table TABLE
   (
    [AccountNumber] [varchar](20) NOT NULL
   )
    -- strip out the leading zeroes first
   SET @AccountNumber = SUBSTRING(@AccountNumber, PATINDEX(''%[^0]%'',@AccountNumber), len  (@AccountNumber))
   DECLARE @counter int
   SET @counter = len(@AccountNumber)
    WHILE @counter <= 20
     BEGIN
      INSERT INTO @AccountNumber_Table 
        VALUES (REPLICATE(''0'', @counter - len(@AccountNumber)) + @AccountNumber)
      SET @counter = @counter + 1
     END


So, if the accountnumber the user is searching for is let's say 12345678 then the above table variable will contain values 12345678, 012345678, 0012345678 and so on. Now, instead of the where clause we simply join our "customers" table with this table variable we created as:
   SELECT customer.*
   FROM customers INNER JOIN @AccountNumber_Table ON
 
        customers.AccountNumber = @AccountNumber_Table.AccountNumber

the optimizer will now do an index seek on your AccountNumber index and return the results in a fraction of the time while completely eliminating the performance issues that a clustered index scan on a large table would cause.

PS check out our xSQL Profiler - it allows you to schedule, run traces and automatically collect trace data from multiple servers. It is free for one SQL Server instance.

0 comments:

Post a Comment