Creating a baseline
This document introduces the subject of trending and monitoring for performance ( as against alerts ).
If your server performance degrades you need to be able to compare.
Ø If you cannot do this then you are on the back foot straight away and you may well hear such comments as “ So why do we employ a DBA then?” or worse still “Maybe we should look at replacing SQL Server with Oracle”
Ø This document forms the basis of the statistic gathering I introduced in SQL 2000, it’s not particularly rocket science or even very exciting, it even manifests itself as management reporting, some examples below, however if you’re supporting critical business systems and you get in the firing line when things go bad, then you might find this type of process rather essential.
|
Month |
Average Transactions / Second |
||||
|
Total |
Sales |
WorkFlow |
Orders |
Tempdb |
|
|
October 2006 |
238 |
106 |
3 |
22 |
97 |
|
November 2006 |
201 |
80 |
2 |
23 |
86 |
|
December 2006 |
134 |
53 |
2 |
14 |
57 |
|
January 2007 |
269 |
165 |
4 |
22 |
62 |
|
February 2007 |
179 |
85 |
3 |
16 |
36 |
|
March 2007 |
139 |
75 |
3 |
16 |
30 |
|
April 2007 |
156 |
76 |
4 |
20 |
39 |
|
May 2007 |
180 |
83 |
4 |
21 |
53 |
|
June 2007 |
175 |
84 |
4 |
20 |
47 |
|
July 2007 |
159 |
74 |
4 |
20 |
43 |
|
August 2007 |
148 |
66 |
3 |
20 |
41 |
|
September 2007 |
153 |
65 |
3 |
25 |
41 |
|
October 2007 |
207 |
101 |
4 |
29 |
53 |
|
Date ( 1st of ) |
Total Data Size |
Data Growth for the month |
Projected 12 month size |
|
Jun 2006 |
34,721.09 |
|
|
|
Jul 2006 |
38,869.29 |
4,148 |
168,000 Mb |
|
Aug 2006 |
50,199.05 |
11,329.76 |
212,000 Mb |
|
Sep 2006 |
59,027.62 |
8,828.57 |
225,600 Mb |
|
Oct 2006 |
74,100.98 |
15,073.36 |
285,000 Mb |
|
Nov 2006 |
82,460.33 |
8,359.00 |
200,000 Mb |
|
Dec 2006 |
91,546.58 |
9,087.00 |
212,000 Mb |
|
Jan 2007 |
91,353.23 |
-193.35 |
|
|
Feb 2007 |
96,771.10 |
5,417.87 |
|
|
Mar 2007 |
122,255.81 |
25,484.71 |
|
|
Apr 2007 |
132,750.58 |
10,494.77 |
|
|
May 2007 |
127,405.57 |
-5,345.01 |
|
|
Jun 2007 |
140,451.88 |
13,046.31 |
|
|
Jul 2007 |
151,522.62 |
11,070.74 |
300,000 Mb |
|
Aug 2007 |
158,673.16 |
|
|
|
Sep 2007 |
168,236.72 |
|
|
|
Oct 2007 |
179,723.10 |
11,486.38 |
|
|
1 Nov 2007 |
188,745.10 |
( 9,022 - 14,825 ) |
386,469 Mb |
Server Performance for October
|
Weekday |
Date |
CacheHitRatio ( Avg ) |
Logins ( Avg ) |
Deadlocks ( Total ) |
Latchwaits\sec |
Lockwaits\sec |
|
Monday |
3 |
63.365883 |
508 |
0 |
112 |
0.13553031 |
|
Tuesday |
4 |
63.633709 |
541 |
1 |
101 |
0.06068182 |
|
Wednesday |
5 |
64.069511 |
544 |
0 |
111 |
0.15459596 |
|
Thursday |
6 |
62.808182 |
518 |
0 |
115 |
0.29828283 |
|
Monday |
10 |
72.500183 |
459 |
548 |
119 |
0.12861112 |
|
Tuesday |
11 |
67.146805 |
516 |
3 |
117 |
8.1414141E-2 |
|
Wednesday |
12 |
65.199837 |
553 |
1 |
118 |
0.13881312 |
|
Thursday |
13 |
64.065308 |
578 |
1 |
135 |
0.36633837 |
|
Friday |
14 |
63.502281 |
577 |
1 |
127 |
4.6035353E-2 |
|
Monday |
17 |
64.474243 |
492 |
2 |
176 |
9.6565656E-2 |
|
Tuesday |
18 |
64.344391 |
591 |
1 |
172 |
9.7676769E-2 |
|
Wednesday |
19 |
64.221291 |
572 |
2 |
163 |
0.11300505 |
|
Thursday |
20 |
64.010361 |
567 |
2 |
163 |
8.4797978E-2 |
|
Friday |
21 |
64.742073 |
476 |
2 |
152 |
8.7853536E-2 |
|
Monday |
24 |
72.102028 |
506 |
1 |
161 |
4.8813131E-2 |
|
Tuesday |
25 |
66.991104 |
513 |
3 |
198 |
0.1084596 |
|
Wednesday |
26 |
65.961151 |
514 |
1 |
260 |
0.14002526 |
|
Thursday |
27 |
65.872841 |
504 |
1 |
257 |
0.10290404 |
|
Friday |
28 |
65.586937 |
451 |
0 |
240 |
8.5252523E-2 |
|
Monday |
31 |
69.859261 |
493 |
2 |
400 |
9.0075761E-2 |
These are just a few examples presented as tables, I usually graph some of the results, this is easier with Reporting Services, the original reports didn’t have Reporting Services available.
Why?
Collection of data
Establish what data to collect
Server level
SQL 2000 Basic Counters available are :-
|
Auto-Param Attempts/sec AWE unmap calls/sec Backup/Restore Throughput/sec Bulk Copy Throughput/sec DBCC Logical Scan Bytes/sec Failed Auto-Params/sec Free list requests/sec FreeSpace Scans/sec Latch Waits/sec Lock Timeouts/sec Log Cache Reads/sec Logins/sec Number of Deadlocks/sec Page reads/sec Pages Allocated/sec Readahead pages/sec Shrink Data Movement Bytes/sec SQL Re-Compilations/sec Unsafe Auto-Params/sec |
AWE lookup maps/sec AWE unmap pages/sec Batch Requests/sec Cache Use Counts/sec Extent Deallocations/sec Forwarded Records/sec Free list stalls/sec Full Scans/sec Lazy writes/sec Lock Waits/sec Log Flush Waits/sec Logouts/sec Page Deallocations/sec Page Splits/sec Probe Scans/sec Safe Auto-Params/sec Skipped Ghosted Records/sec Table Lock Escalations/sec Workfiles Created/sec |
AWE stolen maps/sec AWE write maps/sec Bulk Copy Rows/sec Checkpoint pages/sec Extents Allocated/sec Free list empty/sec FreeSpace Page Fetches/sec Index Searches/sec Lock Requests/sec Log Bytes Flushed/sec Log Flushes/sec Mixed page allocations/sec Page lookups/sec Page writes/sec Range Scans/sec Scan Point Revalidations/sec SQL Compilations/sec Transactions/sec Worktables Created/sec |
Full details of sql perfmon counters can be found in the SQL Server 2000 Performance Tuning Technical Reference – Microsoft Press ISBN 0-7356-1270-6
SQL Server 2005 has introduced an extended number of performance counters query the table sys.dm_os_performance_counters or open perfmon and scroll through the SQL counters
|
-- SQL 2000 select counter_name ,cntr_value,cast((cntr_value/1024.0)/1024.0 as numeric(8,2)) as Gb from dbo.sysperfinfo where counter_name like '%server_memory%' -- SQL 2005 select counter_name ,cntr_value,cast((cntr_value/1024.0)/1024.0 as numeric(8,2)) as Gb from sys.dm_os_performance_counters where counter_name like '%server_memory%';
|
|
-- SQL 2000 declare @init numeric(20,2),@final numeric(20,2) declare @start_time datetime,@final_time datetime,@count numeric(20,2) -- select @init=cntr_value,@start_time=getdate() from dbo.sysperfinfo where cntr_type=272696320 and counter_name = 'SQL Re-Compilations/sec' waitfor delay '00:00:10' select @final=cntr_value,@final_time=getdate() from dbo.sysperfinfo where cntr_type=272696320 and counter_name = 'SQL Re-Compilations/sec' set @count=datediff(ss,@start_time,@final_time) select cast ((@final-@init)/@count as numeric(20,2)) -- -- SQL 2005 declare @init numeric(20,2),@final numeric(20,2); declare @start_time datetime,@final_time datetime,@count numeric(20,2); -- select @init=cntr_value,@start_time=getdate() from sys.dm_os_performance_counters where cntr_type=272696320 and counter_name = 'SQL Re-Compilations/sec'; waitfor delay '00:00:10'; select @final=cntr_value,@final_time=getdate() from sys.dm_os_performance_counters where cntr_type=272696320 and counter_name = 'SQL Re-Compilations/sec'; set @count=datediff(ss,@start_time,@final_time); select cast ((@final-@init)/@count as numeric(20,2));
|
Suggested counters might be:-
|
Procedure Cache Hit Ratio |
% age |
This value should always be high , >90%, and is an indication of how often cached plans satisfy calls. For 32bit SQL Server the procedure cache may not exist in extended memory so can be constrained by other memory pressures. A low cache hit ratio is an indication of a lack of server memory or of less than optimal code. A low cache hit ratio can also be produced where the volume of transactions and number of databases is such that the procedure cache cannot support them. |
|
Transactions |
Number / sec Total and for selected databases including tempdb |
Transactions/second measure work done within a database and within the server. Transactions only measure changes. This is a measure of database activity. Tempdb should be included as this database can sometimes be a bottleneck. Temporary table creation, sorts, order by, work tables etc. are created in tempdb. |
|
Compilations |
Number / sec total for the server |
All work on the SQL Server |
|
Logins |
Total |
The number of persisted connection to the server. Each connection typically takes resource of around 64k memory. Connection memory cannot come from extended memory and thus unnecessary persisted connections can decrease the available memory for other processes. |
|
Connections |
Number / sec |
This is the count of users connecting every second, connecting is resource hungry, connection pooling if applicable should minimise this figure, high values for this counter may indicate issues with connection pooling. |
|
Packets received Packets sent |
Number / sec |
Indication of server activity. Can be used to compare against network bandwidth |
|
Reads Writes |
Number / sec |
Count of page reads and writes executed by SQL Server. Indication of server activity. |
|
Packet Errors |
Count |
Should be zero |
|
Read/Write errors |
Count |
Should be zero |
|
Workfiles Worktables |
Number / sec |
Indication of work within tempdb. Workfiles and worktables are created during sorts for example. High levels for these figures may indicate less than optimal sql. |
|
Full Scans |
Number / sec |
This figure should be zero but rarely is. It indicates the number of full table or index scans performed. |
|
IndexSearches |
|
|
|
PageSplits |
Number / sec |
Too many page splits are bad, ideally this value should be very low but typically often isn’t. A page split occurs when a new record is inserted within an existing page and there is no room. Page splits are expensive in performance terms and are also likely to result in data fragmentation in the database. Certain monitoring programs can influence this counter. |
|
LockRequests LockTimeouts LockWaits |
Number / sec Number / sec /average wait time in milliseconds |
Indicators of server performance. Lock requests show activity. Lock timeouts should be minimal. Note that LockTimeouts and LockWaits do not directly relate to database locking activity |
|
LatchWaits
|
/average wait time in milliseconds |
Latches are lightweight locks – this is a useful counter. |
|
Deadlocks |
Count |
Must be zero. Deadlocks are bad news |
Database Growth
Where to put the data?
Other metrics of interest
|
% Idle Time
Disk Idle Time
Data and log drives |
% age |
The disk idle time is considered to be a more accurate measure of disk activity ( actual activity equals 100 – idle time ) This value is valid for a SAN The transaction log drives should never show sustained capacity usage , 0% idle time. |
|
Avg. Disk sec/read Avg. Disk sec/write Avg. Disk sec/transfer
Average i/o completion time
Mainly data drives, variations should never be seen for log drives. |
Average duration ms |
This is the time in milliseconds for disk operations to complete, counters are available for overall i/o, reads and writes. Typically i/o should complete as quickly as possible, the actual values may vary according to configuration, typically expect values of less than 6 ms. This counter is valid for a SAN. This counter is very useful for monitoring system performance, high values for i/o completion times will generally indicate degraded system performance. There should never be high values for the transaction log drives, any slow down in the transaction log writes will seriously degrade performance – transactions have to write successfully before control is returned to the process, in practice the configuration of the drive and controller caching should make this transparent – expect to see no values for this counter for the log drive. |
|
Disk Reads/sec Disk Writes/sec Disk Transfers/sec
Disk i/o
Data drive generally. Total, reads and writes |
number/sec |
Count of actual disk operations A pattern of i/o should emerge, any variation will be an indicator of potential problems. Any disk or array has a maximum supported i/o count, performance will start to degrade at 75% - 85% of the maximum supported i/o Typically a 10k spindle might support around 280 sequential and 105 random i/o s per second. The exact figures can be calculated from the manufacturer’s specification for the disk. |
|
Cpu |
|
There is a direct relationship between cpu, memory and disks. High cpu may be an indicator of inadequate memory or an inefficient disk subsystem. Poorly optimised sql may also increase cpu due to increased i/o. Generally faster processors are better than multiple slower processors. More processor cache improves performance significantly. Hyperthreading is generally beneficial but can sometimes cause problems. Total sustained cpu >80% may be an indicator of performance bottlenecks. |
|
Memory |
W2k can support 8gb W2003 can support 64Gb ( 32 bit )
64bit o/s have a flat memory model.
For absolute performance have physical memory equal to database size. |
Generally SQL Server works best with fixed allocated memory, in this case most memory counters do not work.
Typically adding memory where sql server can use it will reduce cpu and improve performance. Memory is the most effective hardware upgrade. For the 32bit platform configuration of extended memory mostly only benefits data caching – this would reduce physical i/o. Reading and writing to memory is far more efficient than to disk, low values for the Page life expectancy counter may be an indicator of insufficient memory.
|
|
Disk Queue Data and Log drives |
Count |
These values should typically be zero, disk queue counters usually do not give meaningful values for a SAN. |
ü Gathering data as described will allow a baseline of normal activity to be set.
NOTE:
There are a number of excellent Microsoft White Papers which cover all aspects of performance, mainly for SQL 2005 onwards.
Document tidied up prior to publication to www.grumpyolddba.co.uk
© colin leversuch-roberts 2001 - 2007