It’s easy when you think about it !

 

Calculated Columns, persistence and indexes

 

Background:

 

v  The code belongs to a process which runs every 10 minutes, 24 hours a day.

v  I picked this up from my monitoring as it was producing pronounced spikes on my performance graphs.

v  To explain – I have extensive monitoring on my production systems and although the production system is used 24 x 7 activity at weekends is much less, it was by my normal checks that I discovered this process.

v  ( See graph extracts below )

v  The reality was that I identified a process running 6 times an hour and asked who’d instigated a process on that schedule – I was then able to examine the procedure ( see below )

v  Note 1 -  The code has been sanitised.

v  Note 2 -  Some indexing was changed but made no significant improvement.

v  The rowcounts in the underlying tables are smallish, 10’s of thousands at most

v  In this instance we’re trying to eliminate any Regus emails because that will be an internal user, and we’re checking to see if there was an error when last processed.

 

There were two distinct problems with this query, it took too long to run and it used excessive io.

 

What’s the problem to resolve ?   well here’s the sanitised query ( to protect the innocent !! )

 

( problem areas highlighted )

 

The column ErrorDescription is a ntext column.

 

NOTE:  There’s actually several other issues within this query but I’m looking at the highlighted SARGS which can be resolved.

 

 

select

    con.PersonID                

    ,con.Mail                   

    ,eer.RoleListCsv 

    ,eer.AppListCsv  

    ,@ImportDate   

    ,ct.communicationMethodId   

    ,ct.communicationTypeId      

    ,1                           

    ,NULL                       

    ,NULL    

from

    dbo.Priority eep

    inner join dbo.Rulez eer  on eep.PriorityId = eer.PriorityId

    inner join dbo.Extraction cfee  on cfee.CountryID = eer.CountryID

    inner join dbo.Companycv cov   on cov.CompanyID = cfee.CompanyID

    inner join dbo.Company co  on co.CompanyID = cov.CompanyID and cov.CompanyVersion = co.CompanyVersion and co.CtypeID = eep.CtypeID and co.ClassID = eep.ClassID

    inner join dbo.Personcv conv  on conv.PersonID = cfee.PersonID

    inner join dbo.Person con  on con.PersonID = conv.PersonID and con.PersonVersion = conv.PersonVersion

    inner join dbo.CommType ct  on ct.CommTypeID = eer.CommTypeID

where

    eep.Priority = @RulePriority

       and eep.ActiveFromDate <= getDate()

       and eer.ActiveFromDate <= getDate()

       and co.IsAccountBlocked =

       and co.IsActive = 1

       and con.IsActive = 1

       and not exists (select * from dbo.PersonLogon cl  where cl.PersonID = con.PersonID)

       and not exists (select * from dbo.PersonLogonImport cli  where cli.PersonID = con.PersonID and Imported = 0 and ErrorDescription is not null)

       and len(con.Mail) > 6

       and con.Mail not like '%@regus.com';

 

Now let’s work through an example you can run yourself.

 

1.            A calculated column on a text column based around null

 

LOB reads can be expensive, generally you will be doubling up ( at least ) the io for each row, this is even more galling if you don’t actually want the data. You could also be bringing more data then you need into cache which in turn could be flushing data you actually need.

In the production system the column ErrorDescription is of type NText

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Create table

 

CREATE TABLE [dbo].[TestTable1](

       [NumKey] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

       [cGuid] [uniqueidentifier] NULL,

       [cSguid] [uniqueidentifier] NOT NULL,

       [cBig] [bigint] NULL,

       [cInt] [int] NULL,

       [cDate1] [datetime] NOT NULL,

       [cDate2] [datetime] NULL,

       [cVchar1] [varchar](50) NULL,

       [cChar1] [char](20) NULL,

       [cVchar2] [varchar](500) NULL,

       [cChar2] [char](100) NULL,

       [cText] [text] NULL,

 CONSTRAINT [PK_TestTable1] PRIMARY KEY CLUSTERED

(

       [NumKey] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY];

GO

ALTER TABLE [dbo].[TestTable1] ADD  CONSTRAINT [DF_TestTable1_cGuid]  DEFAULT (newid()) FOR [cGuid];

GO

ALTER TABLE [dbo].[TestTable1] ADD  CONSTRAINT [DF_TestTable1_cSguid]  DEFAULT (newid()) FOR [cSguid];

 

 

Populate the table with 500,000 rows.

 

declare @num bigint,@count int = 1,@count2 int = 12, @count3 int =1;

--

while @count<500001

begin

IF (@count%2)=0

       begin

              insert into dbo.TestTable1(cBig,cInt,cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2,cText)

              values( 36854775807+(36854775807/@count),@count2*3,dateadd(dd,@count3,'25 MAY 2007'),dateadd(mi,@count,getdate()),

              reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))),'0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),

              replicate(convert(char(36),newid()),10),convert(varchar(20),datename(dw,getdate()+@count)+datename(month,getdate()+@count))+'blahblahblahblahblahblahblahblahblahblahblahblah'

,convert(char(6000),getdate()))

       end

else

       begin

              insert into dbo.TestTable1(cBig,cInt, cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2,cText)

              values( 854775807-(36854775807/@count),@count2*2,dateadd(dd,@count3,'5 dec 2006'),dateadd(mi,@count,getdate()),

              reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))),'0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),

              replicate(convert(char(36),newid()),10),convert(varchar(20),datename(dw,getdate())+datename(month,getdate()))+'blahblahblahblahblahblahblahblahblahblahblahblah'

,null)

       end

set @count = @count+1;

set @count2 = @count2+2;

IF @count2>50

       set @count2 = 12;

IF (@count%3000)=0

       set @count3 = @count3+1;

 

end;

go

 

 

v  The table will have 500,000 rows with half of them having null for column cText

 

v  If we simulate the ErrorDescription is not null part of the query

 

 

 

select top 1000 NumKey from dbo.TestTable1 where cText is not null

order by NumKey asc;

 

 

(1000 row(s) affected)

Table 'TestTable1'. Scan count 1, logical reads 164, physical reads 0, read-ahead reads 0, lob logical reads 1000, lob physical reads 0, lob read-ahead reads 0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The actual table reads are minimal but the LOB reads  are significant, the 1,000  integers returned in the result set  actually require 609MB in buffer cache,  1,000 ints technically only require 4k of storage. Hmm 4kb  vs  609mb, not good.

 

Now if we add a calculated column  to hold a 1 when cText is null this should be able to optimise the query.

 

--

-- this is wrong – but do you know why?

--

alter table dbo.TestTable1 add ErrorIsNull  as ISNULL(cText,'1') persisted;

--

 

You cannot directly specify the data type for a calculated column, here you can see that I actually end up with another text column bacause the calculated column takes the underlying data type from the query which is still a Text column.

Further more only the nulls are changed to ‘1’ the results in the non null columns are exactly as in cText column.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

What I actually want is a bit column set to 1 or 0, here’s what’s required

 

 

alter Table dbo.TestTable1 add ErrorIsNull as convert(bit,case when cText IS NULL  then '1' else '0' end) persisted;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

v  Because I want to index the column, even if 50% of the results are null, the column must be persisted.

 

v  I want to create an index on the NumKey and persisted column because that will avoid using the primary key

 

 

create index idx_TestTable1_ErrorIsNull on dbo.TestTable1(ErrorIsNull,Numkey);

 

 

select top 1000 NumKey from dbo.TestTable1 where cText is not null

order by NumKey asc;

 

 

(1000 row(s) affected)

Table 'TestTable1'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

(1 row(s) affected)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

What this shows is that when you’re looking at a query, the page reads are not always the whole story.

The original query required 609 mb in cache, the calculated column 3 mb, even on a server with 64GB of ram this difference could be significant.

If your database doesn’t all sit in cache then queries like this can cause increased physical io and degrade performance

For this simple test there’s no real difference in execution time, in the production query  there was a measurable decrease  in execution time when implementing the persisted columns.

 

Here’s how to handle the email address sarg

 

 

alter table dbo.TestTable1 add [IsRegusEmail]  AS (CONVERT([bit],case when [email] like '%@regus.com' then '1' else '0' end,0)) PERSISTED;

 

 

Query to return top 20 indexes in buffer cache

--

-- top 20 indexes in buffer cache for the given database.

-- this reports for the database you run this against

--

SELECT top 20 obj.[name]as "Table Name" ,obj.index_id ,si.name,convert(numeric(10,2),(count(*)*8)/1024.0) AS "cached size (mb)"

FROM sys.dm_os_buffer_descriptors AS bd

    INNER JOIN

    (

        SELECT object_name(object_id) AS name

            ,index_id ,allocation_unit_id

        FROM sys.allocation_units AS au

            INNER JOIN sys.partitions AS p

                ON au.container_id = p.hobt_id

                    AND (au.type = 1 OR au.type = 3)

        UNION ALL

        SELECT object_name(object_id) AS name  

            ,index_id, allocation_unit_id

        FROM sys.allocation_units AS au

            INNER JOIN sys.partitions AS p

                ON au.container_id = p.hobt_id

                    AND au.type = 2

    ) AS obj

        ON bd.allocation_unit_id = obj.allocation_unit_id

join sys.indexes si on si.index_id = obj.index_id    and si.[object_id] = object_id(obj.name)

WHERE bd.database_id = db_id()

GROUP BY obj.name, obj.index_id,si.name

ORDER BY "cached size (mb)" DESC;

 

 

© www.grumpyolddba.co.uk  September 2010