My 24 hours of wait stats

Below is excellent feedback and links received from Paul Randal of SQLSkills on 24 hours of wait stats data captured from several of SQL Server instances l help manage, labeled for this post, instance1,instance2 etc.
I will update this as l go through the suggestions.

INSTANCE1
For the LATCH_EX, you’ll need to drill in with my sys.dm_os_latch_stats code at http://www.sqlskills.com/blogs/paul/advanced-performance-troubleshooting-waits-latches-spinlocks/. If you have CXPACKET waits, you’ll maybe see ACCESS_METHODS_DATASET_XX latches, which are normal for parallel scans.

For OLEDB, if the average wait times are very low, it’s likely to be something like a monitoring tool constantly running DMVs (that use OLE-DB under the covers), rather than linked-server queries. If they’re more than a 1-2 ms each, it’s likely not from perf monitoring tools running small DMV queries; it could be from linked-server queries (in which case you’ll need to debug the queries on the remote server), running DBCC CHECK* commands, SSIS from an OLEDB source, or your own DMV queries.

For the PAGEIOLATCH_SH/EX/UP, these are pages that are being read from disk. You want to figure out why SQL Server is needing to read pages and a pivot on that is why the buffer pool doesn’t contain all data required. I’d be concerned about the 10-25ms for each read.
Check out Module 5 of the Wait Stats Pluralsight course for more details.

Your BACKUPBUFFER/BACKUPIO waits are backup waiting for a buffer to use and reads to occur. These are for filling and writing the buffers used during a backup. Commonly you’ll see hundreds or thousands per data backup, with the cumulative wait for those occurring per backup being about the length of time the backup takes. Higher wait times per BACKUPIO would be from writing to a network file or maybe a tape file.

The ASYNC_IO_COMPLETION waits are a bit skewed and any long duration is from data backups. There are three waits per data backup that occurs, and one of these three is for the entire duration of the data-copy portion of the backup, which is what causes the high wait times. Although these look high, they are expected. See http://www.sqlskills.com/blogs/paul/cause-high-duration-async_io_completion-waits/.

The BACKUPTHREAD waits are where you have backup parallelism happening – from multiple data files on different drive letters/mount points or multiple backup files on different drive letters/mount points.

The PREEMPTIVE_OS_FILEOPS are from various operations around creating new files on NTFS.

The MSQL_XP (and any PREEMPTIVE_OS_GETPROCADDRESS) are extended stored procedures executing code in your DLLs. To reduce MSQL_XP, you’ll need to speed up your DLL code, although these waits could be because of third-party software using XPs.

CXPACKET is pretty high, although the wait times aren’t that bad. Check out Module 5 of the Wait Stats course on Pluralsight where I explain CXPACKET and some of the causes and fixes. Definitely make sure you’re expecting parallelism for the query plans that are doing it (using my waiting tasks script at http://www.sqlskills.com/blogs/paul/advanced-performance-troubleshooting-waits-latches-spinlocks/). You do have one of the classic patterns: CXPACKET + PAGEIOLATCH_SH, which could be parallel table scans occurring instead of more desirable index seeks.

Not sure why you’re seeing so many PREEMPTIVE_OS_CREATEFILE here. If you’re interested, take a look at the method at http://www.sqlskills.com/blogs/paul/determine-causes-particular-wait-type/, set it up for that wait type and send me some of the results.

The WRITELOG look fine at only 1.2ms each. You might be able to reduce these by removing any unused or duplicate nonclustered indexes, and other things that cause extra log – if they exist. These could be just normal as WRITELOG is usually present in the top waits.
See Module 5 in the Wait Statistics Pluralsight course for more info, plus http://www.sqlperformance.com/2012/12/io-subsystem/trimming-t-log-fat and http://www.sqlperformance.com/2013/01/io-subsystem/trimming-more-transaction-log-fat.

INSTANCE2
The ASYNC_NETWORK_IO are <1ms each, so I'm not concerned. These are usually from poor application programming (doing RBAR - row by agonizing row - processing rather than caching the data client side and letting SQL Server progress) or possibly network delays. Check out Module 5 of the Wait Stats course on Pluralsight. INSTANCE3
The PREEMPTIVE_OS_WAITFORSINGLEOBJECT goes hand in hand with the ASYNC_NETWORK_IO.

The PREEMPTIVE_OS_FILEOPS are from various operations around creating new files on NTFS.

Not sure why you’re seeing so many PREEMPTIVE_OS_REPORTEVENT here. If you’re interested, take a look at the method at http://www.sqlskills.com/blogs/paul/determine-causes-particular-wait-type/, set it up for that wait type and send me some of the results.

INSTANCE4 – SHAREPOINT1
The PREEMPTIVE_OS_AUTHENTICATIONOPS (and any similar security calls) are calls out to windows (and maybe through to an AD or domain controller) to gets security info. These look fine.

The LCK_M_U are update locks being acquired to (for example) allow a multi-row update to not block concurrent reads until all rows are ready to be updated. For long waits, you could dig in with the blocked process report to figure this out – see Michael Swart’s blog at http://michaeljswart.com/2011/05/when-to-use-blocked-processes-reports/.

The LCK_M_S are likely being blocked by something that has (or is waiting for) a SCH_M lock – an index rebuild, for instance – or an X/IX lock (maybe lock escalation). It could also be an application using the serializable isolation level (or DTC, which forces serializable, or using .Net TransactionScope which defaults to serializable.) You can use the blocked process report to figure this out – see Michael Swart’s blog at http://michaeljswart.com/2011/05/when-to-use-blocked-processes-reports/.

INSTANCE5 – SHAREPOINT2
IO_COMPLETION/WRITE_COMPLETION waits are I/Os that are not table/index pages, or not reading them for query purposes. WRITE_COMPLETION can be from writing to a database snapshot (e.g. while running DBCC CHECK*). I’ll be doing a blog post about these in the next couple of weeks. These don’t look problematic on your server.

For the LCK_M_X, I’d want to know why these are occurring – most likely some sort of lock escalation if they’re table locks. It could also be an application using the serializable isolation level (or DTC, which forces serializable, or using .Net TransactionScope which defaults to serializable.) You can use the blocked process report to figure this out – see Michael Swart’s blog at http://michaeljswart.com/2011/05/when-to-use-blocked-processes-reports/.

SQLTRACE_XXX can be ignored, as is so in the latest version of my code.

Nothing to add for the other servers.

Cheers


Paul S. Randal
CEO and Owner, SQLskills.com, SQL MVP, MS Regional Director

=================================================