CPU打高的常见原因
1、扫描量大的慢SQL【最常见】
2、业务高并发,QPS高【偶尔可见】
3、物理机等其他原因【极其少见】
排查思路:
1、先查看CPU打满的时间范围
2、根据CPU打满的时间查看慢日志,关注扫描量大的【一般超过10W的,几万的也要关注下】
3、如果没有扫描量大的慢请求,查看等待类型,然后根据占比高的等待类型分析瓶颈在哪【常见的主要是CXPACKET】
4、如果2和3都没有异常,看下QPS监控,与CPU监控是否一致,一致的话,考虑升级配置
慢SQL优化:
1、获取到扫描量大的慢SQL
SELECT TOP 50
[Avg. MultiCore/CPU time(sec)] = qs.total_worker_time / 1000000 / qs.execution_count,
[Total MultiCore/CPU time(sec)] = qs.total_worker_time / 1000000,
[Avg. Elapsed Time(sec)] = qs.total_elapsed_time / 1000000 / qs.execution_count,
[Total Elapsed Time(sec)] = qs.total_elapsed_time / 1000000,
qs.execution_count,
[Avg. I/O] = (total_logical_reads + total_logical_writes) / qs.execution_count,
[Total I/O] = total_logical_reads + total_logical_writes,
Query = SUBSTRING(qt.[text], (qs.statement_start_offset / 2) + 1,
(
(
CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.[text])
ELSE qs.statement_end_offset
END - qs.statement_start_offset
) / 2
) + 1
),
Batch = qt.[text],
[DB] = DB_NAME(qt.[dbid]),
qs.last_execution_time,
qp.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.[sql_handle]) AS qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
where qs.execution_count > 5 --more than 5 occurences
ORDER BY [Total MultiCore/CPU time(sec)] DESC
慢SQL优化思路
1、先查看SQL执行计划,主要关注是否有table scan,index scan,书签查找,要尽量避免scan操作
2、查看索引信息
聚集索引很珍贵【非聚集索引会存聚集索引键,否则存RID】,不要浪费
非聚集索引:一个SQL中的一个表只能用一个索引,排序字段【order by,group by,distinct,join】最好跟where条件筛选字段一致,可以避免排序操作
索引设计:https://docs.microsoft.com/zh-cn/sql/2014-toc/sql-server-index-design-guide?view=sql-server-2014
3、查看统计信息,统计信息陈旧,需要更新
SQL Server默认更新统计信息的策略,表数据行数发生变化时更新
a,行数从0变为1
b,行数少于500行时,增加到500行+
c,行数多于500行时,数据的变化量大于500+20%*表中数据行数
统计信息问题一般会出现c的情况,数据量变化很大,但是统计信息陈旧,不适用了,需要手动更新下,或者可以用agent建job定时更新
4、索引碎片:
https://yq.aliyun.com/articles/696250?spm=a2c4e.11155435.0.0.3d413312axuFQA
等待类型优化:
WITH [Waits] AS
(SELECT [wait_type],
[wait_time_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
[signal_wait_time_ms] / 1000.0 AS [SignalS],
[waiting_tasks_count] AS [WaitCount],
100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
FROM sys.dm_os_wait_stats
WHERE [wait_type] NOT IN (
N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',
N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH',
N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',
N'CHKPT', N'CLR_AUTO_EVENT',
N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',
-- Maybe uncomment these four if you have mirroring issues
N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',
N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD',
N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',
N'EXECSYNC', N'FSAGENT',
N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',
-- Maybe uncomment these six if you have AG issues
N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE',
N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',
N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP',
N'LOGMGR_QUEUE', N'MEMORY_ALLOCATION_EXT',
N'ONDEMAND_TASK_QUEUE',
N'PREEMPTIVE_XE_GETTARGETSTATE',
N'PWAIT_ALL_COMPONENTS_INITIALIZED',
N'PWAIT_DIRECTLOGCONSUMER_GETNEXT',
N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_ASYNC_QUEUE',
N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
N'QDS_SHUTDOWN_QUEUE', N'REDO_THREAD_PENDING_WORK',
N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',
N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH',
N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP',
N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',
N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP',
N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',
N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT',
N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',
N'WAITFOR', N'WAITFOR_TASKSHUTDOWN',
N'WAIT_XTP_RECOVERY',
N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',
N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT')
AND [waiting_tasks_count] > 0
)
SELECT MAX ([W1].[wait_type]) AS [WaitType],
CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],
CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S],
CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S],
MAX ([W1].[WaitCount]) AS [WaitCount],
CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage],
CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S],
CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S],
CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S]
FROM [Waits] AS [W1]
INNER JOIN [Waits] AS [W2]
ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum]
HAVING SUM ([W2].[Percentage]) - MAX( [W1].[Percentage] ) < 95; -- percentage threshold
GO
CXPACKET居多,解决方法,将MAXDOP改小
其他的等待类型不常见,如果有出现,可以网上查下对应等待类型产生的原因,根据原因采取对应的解决方案