原文:
SQL Server 2008性能故障排查(二)——CPU
說(shuō)明一下,CSDN的博客編輯非常不人性化,我在word里面都排好了版,貼上來(lái)就亂得不成樣了。建議CSDN改進(jìn)這部分。也請(qǐng)大家關(guān)注內(nèi)容不要關(guān)注排版。同時(shí)在翻譯的過(guò)程中本人也整理了一次思路,所以還似乎非常愿意翻譯,雖然有點(diǎn)自?shī)首詷罚欠窒斫o大家也是件好事
CPU 瓶頸:
低效的游標(biāo)使用:
承接上一篇: SQL Server 2008性能故障排查(一)——概論
說(shuō)明一下,CSDN的博客編輯非常不人性化,我在word里面都排好了版,貼上來(lái)就亂得不成樣了。建議CSDN改進(jìn)這部分。也請(qǐng)大家關(guān)注內(nèi)容不要關(guān)注排版。同時(shí)在翻譯的過(guò)程中本人也整理了一次思路,所以還似乎非常愿意翻譯,雖然有點(diǎn)自?shī)首詷罚欠窒斫o大家也是件好事
CPU 瓶頸:
CPU瓶頸可能因?yàn)槟硞€(gè)負(fù)載所需的硬件資源不足而引起。但是過(guò)多的CPU使用通常可以通過(guò)查詢優(yōu)化(特別是在一個(gè)服務(wù)器上沒有增加額外負(fù)載或不同查詢時(shí)CPU突然上升的情況)、尋找應(yīng)用程序設(shè)計(jì)方面的問題和優(yōu)化系統(tǒng)配置來(lái)減少。 在你急于買更快、更多的CPU之前,先要找出最耗費(fèi)CPU資源的查詢,并且嘗試去調(diào)整那些查詢或者調(diào)整設(shè)計(jì)/系統(tǒng)配置因素。
性能監(jiān)視器是用于監(jiān)測(cè)和識(shí)別CPU是否已經(jīng)成為了性能瓶頸的其中一個(gè)有用的工具。你要查看:Processor:%Processor Time計(jì)數(shù)器是否很高。如果每個(gè)CPU的該值持續(xù)在80%以上,那么意味著CPU已經(jīng)成為瓶頸。
通過(guò)SQL Server你也可以通過(guò)DMVs檢查CPU是否存在瓶頸。如果在請(qǐng)求等待類型中發(fā)現(xiàn)有SOS_SCHEDULER_YIELD等待類型或者一個(gè)高值的runnable任務(wù)都可是提示可運(yùn)行線程在計(jì)劃等待中。這意味著處理器上出現(xiàn)了CPU瓶頸。如果你有可用的數(shù)據(jù)收集器,SQLServer的等待圖表可以很容易地在任何時(shí)候查看到現(xiàn)有活動(dòng)中的是否有CPU瓶頸。消耗CPU和SOS_SCHEDULER_YIELD等待都在報(bào)表中被納入CPU等待范疇。當(dāng)你看到CPU在高比例使用時(shí),你可以深入查找那些耗資源最多的查詢。
下面的查詢?yōu)槟闾峁┮粋€(gè)高級(jí)視圖去查找目前緩存批處理或者存儲(chǔ)過(guò)程中使用做多CPU資源的查詢。這個(gè)查詢聚合計(jì)算所有執(zhí)行相同計(jì)劃句柄(Plan handle意味著他們來(lái)自相同批處理或者存儲(chǔ)過(guò)程)CPU消耗。如果計(jì)劃句柄超過(guò)一個(gè)語(yǔ)句,你就必須去深入分析以便找到在整個(gè)CPU中耗費(fèi)最大的那個(gè)特定查詢:
select top 50
sum(qs.total_worker_time) as total_cpu_time,
sum(qs.execution_count) as total_execution_count,
count(*) as number_of_statements,
qs.plan_handle
from
sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc
這節(jié)剩余部分將討論一下通過(guò)SQL Server和其他一些有效的方法來(lái)增強(qiáng)CPU以解決這些問題。
過(guò)度的查詢編譯和優(yōu)化:
查詢編譯和優(yōu)化是一個(gè)高CPU消耗的過(guò)程。開銷根據(jù)查詢的復(fù)雜度和優(yōu)先計(jì)劃增加而增加。但是即使一個(gè)簡(jiǎn)單的查詢也會(huì)耗用10-20毫秒的CPUT時(shí)間去解析和編譯。
為了檢查這種開銷,SQLServer緩存并重用經(jīng)過(guò)編譯的查詢計(jì)劃。每次接收到來(lái)自客戶端的查詢時(shí),SQLServer首先回去查找計(jì)劃緩存,是否已經(jīng)存在一個(gè)可以重復(fù)使用的編譯計(jì)劃。如果找不到適合的計(jì)劃,SQLServer將對(duì)查詢進(jìn)行分合和編譯。然后再執(zhí)行。
? 對(duì)于OLTP系統(tǒng)。批處理的提交相對(duì)來(lái)說(shuō)是小而固定的。一般情況下最佳的執(zhí)行計(jì)劃不會(huì)依賴于某些值或者作為謂詞的值,因?yàn)椴樵兪腔阪I值的。重用執(zhí)行計(jì)劃在這種類型的系統(tǒng)中非常重要,因?yàn)榫幾g的開銷往往接近甚至高于直接運(yùn)行的開銷。但是,對(duì)于一個(gè)數(shù)據(jù)倉(cāng)庫(kù)負(fù)載將能從專用SQL和允許查詢優(yōu)化器根據(jù)不同的值選擇不同的執(zhí)行計(jì)劃中得到好處。因?yàn)檫\(yùn)行這些查詢的時(shí)間通常比編譯時(shí)間要大得多。并且查詢優(yōu)化計(jì)劃經(jīng)常根據(jù)查詢謂詞而改變。使用參數(shù)化查詢或者存儲(chǔ)過(guò)程對(duì)OLTP系統(tǒng)能充分重用已經(jīng)編譯的執(zhí)行計(jì)劃。從而降低SQLServer對(duì)CPU的耗用。你可以在數(shù)據(jù)庫(kù)或者查詢級(jí)別上使用PARAMETERIZATION FORCED數(shù)據(jù)庫(kù)選項(xiàng)或者查詢提示來(lái)實(shí)現(xiàn)參數(shù)化。更多關(guān)于該特性使用的限制,比如當(dāng)你依賴于計(jì)算列的索引或者索引視圖等,請(qǐng)查看聯(lián)機(jī)叢書。但是,參數(shù)化最好的使用地方還是在應(yīng)用程序自己內(nèi)部。同時(shí)能通過(guò)參數(shù)化減少被SQL注入的機(jī)會(huì)。相關(guān)只是可以查看聯(lián)機(jī)叢書部分:
? SQL Injection (http://msdn.microsoft.com/en-us/library/ms161953.aspx)
? Using sp_executesql (http://msdn.microsoft.com/en-us/library/ms175170.aspx)
偵測(cè)問題:
在編譯過(guò)程中,SQLServer2008計(jì)算查詢的“簽名”并作為sys.dm_exec_requests和sys.dm_exec_query_stats動(dòng)態(tài)視圖中的query_hash列的信息展示。這個(gè)queryhash屬性在showplan/statistics xml實(shí)體中對(duì)具有相同query_hash值的高可能行設(shè)計(jì)相同查詢文本,如果它被寫成一個(gè)query_hash參數(shù)化形式。查詢中僅是字面值不同但擁有相同的值。舉個(gè)例子:有兩個(gè)共享相同query hash的查詢,當(dāng)?shù)谌齻€(gè)查詢有不同的query hash時(shí)。由于它執(zhí)行不同的操作:
select * from sys.objects where object_id = 100
select * from sys.objects where object_id = 101
select * from sys.objects where name = 'sysobjects'
query hash將在編譯階段從產(chǎn)生結(jié)構(gòu)中被重新計(jì)算。空格將被忽略,就像在SELECT 中,顯式指定列和使用*號(hào)的query hash是不一樣的。此外,在一個(gè)查詢中使用完全限定名和另外一個(gè)使用全表名作為前綴的查詢被認(rèn)為是同一個(gè)查詢,具有相同的query_hash:
Use AdventureWorks
Go
set showplan_xml on
go
-- Assume this is run by a user whose default schema is Sales
select * from SalesOrderHeader h
select * from Sales.SalesOrderHeader h
select SalesOrderID,
RevisionNumber,
OrderDate,
DueDate,
ShipDate,
Status,
OnlineOrderFlag,
SalesOrderNumber,
PurchaseOrderNumber,
AccountNumber,
CustomerID,
ContactID,
SalesPersonID,
TerritoryID,
BillToAddressID,
ShipToAddressID,
ShipMethodID,
CreditCardID,
CreditCardApprovalCode,
CurrencyRateID,
SubTotal,
TaxAmt,
Freight,
TotalDue,
Comment,
rowguid,
ModifiedDate
from Sales.SalesOrderHeader h
go
set showplan_xml off
go
注意當(dāng)query_hash值產(chǎn)生時(shí),數(shù)據(jù)庫(kù)部分的完全限定名被忽略。這允許在相同實(shí)例的很多數(shù)據(jù)庫(kù)中執(zhí)行查詢而不至于因?yàn)橹付藬?shù)據(jù)庫(kù)前綴使得查詢不成功。
一個(gè)簡(jiǎn)單的識(shí)別應(yīng)用程序提交的特殊查詢是查看
sys.dm_exec_query_stats.query_hash列:
select q.query_hash,
q.number_of_entries,
t.text as sample_query,
p.query_plan as sample_plan
from (select top 20 query_hash,
count(*) as number_of_entries,
min(sql_handle) as sample_sql_handle,
min(plan_handle) as sample_plan_handle
from sys.dm_exec_query_stats
group by query_hash
having count(*) > 1
order by count(*) desc) as q
cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go
查詢結(jié)果中額number_of_entries值在千百級(jí)別意味著參數(shù)化很優(yōu)秀。如果你在XML執(zhí)行計(jì)劃的<QueryPlan>標(biāo)簽中查看CompileTime和CompileCPU屬性值是number_of_entries值的翻倍,你應(yīng)該預(yù)測(cè)到多少編譯時(shí)間和CPU用于參數(shù)化查詢(意味著查詢只編譯了一次但是被重用了多次)。去除不必要的緩存計(jì)劃還有一個(gè)難懂的好處。釋放內(nèi)存中的緩存用于其他編譯計(jì)劃,并且留下更多內(nèi)存給緩存。
解決方案:
SQLServer2008同時(shí)也在執(zhí)行計(jì)劃的訪問路徑(即join算法、join順序、索引選擇等等)產(chǎn)生一個(gè)名為“簽名”的query_plan_hash值。某些應(yīng)用程序通過(guò)判斷傳入查詢的參數(shù)來(lái)評(píng)估優(yōu)化器給出的不同執(zhí)行計(jì)劃。在這種情況下如果你不想使得查詢參數(shù)化,你可以使用query_hash和query_plan_hash值共同決定一個(gè)具有相同query_hash值的特定的查詢結(jié)果是否擁有相同或不同的query_plan_hash值,或者訪問路徑。在之前提到的查詢中稍作修改:
select q.query_hash,
q.number_of_entries,
q.distinct_plans,
t.text as sample_query,
p.query_plan as sample_plan
from (select top 20 query_hash,
count(*) as number_of_entries,
count(distinct query_plan_hash) as distinct_plans,
min(sql_handle) as sample_sql_handle,
min(plan_handle) as sample_plan_handle
from sys.dm_exec_query_stats
group by query_hash
having count(*) > 1
order by count(*) desc) as q
cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go
注意本查詢根據(jù)給定的query_hash值返回一個(gè)唯一的查詢計(jì)劃(query_plan_hash值)。即使唯一的計(jì)劃數(shù)量超過(guò)1個(gè),你也可以使用sys.dm_exec_query_plan去檢索不同的執(zhí)行計(jì)劃和檢驗(yàn)它們對(duì)實(shí)現(xiàn)性能優(yōu)化是否不同。
當(dāng)你決定哪個(gè)查詢需要被參數(shù)化時(shí),參數(shù)化的發(fā)生地方最好是在客戶端應(yīng)用程序。具體的實(shí)現(xiàn)方法受到你所選擇的API。但是有一件不變的事實(shí)就是所有API都用于替代創(chuàng)建文本化謂詞,你可以創(chuàng)建一個(gè)帶有問號(hào)(?)的字符串作為參數(shù)占位符。
-- Submitting as ad hoc query
select * from Sales.SalesOrderHeader where SalesOrderID = 100
-- Submitting as parameterized
select * from Sales.SalesOrderHeader where SalesOrderID = ?
你應(yīng)該為你綁定參數(shù)值使用一個(gè)合適的APIs(ODBC,OLE DB,或者SQL Client)。客戶端使用sp_executesql來(lái)執(zhí)行參數(shù)化:
exec sp_executesql N’select * from Sales.SalesOrderHeader where SalesOrderID = @P1’, N’@P1 int’, 100
由于查詢是參數(shù)化,能重用現(xiàn)有的緩存計(jì)劃。如果希望整個(gè)數(shù)據(jù)庫(kù)合理地參數(shù)化,并且你不需要控制或者修改客戶端應(yīng)用程序,你同樣可以使用強(qiáng)制數(shù)據(jù)庫(kù)參數(shù)來(lái)實(shí)現(xiàn)。注意前面的警告,這會(huì)阻止優(yōu)化器選擇有效的索引視圖和索引:會(huì)阻止優(yōu)化器選擇有效的索引視圖和索引:會(huì)阻止優(yōu)化器選擇有效的索引視圖和索引:
ALTER DATABASE AdventureWorks SET PARAMETERIZATION FORCED
如果你不能在客戶端應(yīng)用程序參數(shù)化或者強(qiáng)制整個(gè)數(shù)據(jù)庫(kù)參數(shù)化,你依然能對(duì)某些查詢使用臨時(shí)的計(jì)劃暗示,使用OPTION(PARAMETERIZATION FORCED)。
非必要的重編譯:
當(dāng)一個(gè)批處理或者一個(gè)遠(yuǎn)程存儲(chǔ)調(diào)用(RPC)被提交到SQLServer時(shí),服務(wù)器會(huì)在開始執(zhí)行錢檢查查詢計(jì)劃的有效性和正確性。如果這些檢查不通過(guò),批處理就必須重新編譯并產(chǎn)生一個(gè)不同的執(zhí)行計(jì)劃。這種編譯叫“重編譯”。重編譯很多時(shí)候是必須的,用于確保正確性或者當(dāng)服務(wù)器覺得底層數(shù)據(jù)更改時(shí)重編譯更加有效。編譯是一件耗費(fèi)CPU資源的動(dòng)作,因此過(guò)度的重編譯結(jié)果可能是影響系統(tǒng)的CPU性能。
在SQLServer2000中,當(dāng)SQLServer重編譯一個(gè)存儲(chǔ)過(guò)程時(shí),整個(gè)存儲(chǔ)過(guò)程都會(huì)被重編譯。在SQLServer2005、2008中,存儲(chǔ)過(guò)程的重編譯是語(yǔ)句級(jí)別的。當(dāng)2005、2008重編譯存儲(chǔ)過(guò)程時(shí)。只重編譯需要編譯的語(yǔ)句,而不是整個(gè)存儲(chǔ)過(guò)程編譯。這樣能減少CPU頻寬和更少的鎖資源(比如COMPILE鎖)。重編譯一般出現(xiàn)在以下情況:
-
?架構(gòu)變更。
-
統(tǒng)計(jì)信息變更
-
延遲編譯
-
SET選項(xiàng)更改
-
臨時(shí)表變更
-
在存儲(chǔ)過(guò)程創(chuàng)建時(shí)使用了RECOMPILE提示。
探測(cè):
可以使用性能監(jiān)視器和SQLServer Profiler去探測(cè)過(guò)多的編譯和重編譯:
性能監(jiān)視器(Performance Monitor):
SQL Statistics對(duì)象提供編譯監(jiān)視計(jì)數(shù)器和SQLServer實(shí)例被發(fā)送請(qǐng)求的類型。你需要監(jiān)控批處理的編譯和重編譯數(shù)來(lái)發(fā)現(xiàn)哪些制造了高CPU使用率。理想情況下,SQL Recompilations/sec和Batch Requests/sec都應(yīng)該很低。除非用戶提交了特殊查詢。
主要計(jì)數(shù)器:
SQL Server:SQL Statistics:Batch Requests/sec
SQL Server:SQL Statistics:SQL Compilations/sec
SQL Server:SQL Statistics:SQL Recompilations/sec
詳細(xì)信息請(qǐng)參考聯(lián)機(jī)叢書中的SQL Statistics Object
SQL Server Profiler Trace:
如果性能監(jiān)視器暗示重編譯次數(shù)過(guò)高,重編譯就有可能成為SQLServer耗費(fèi)CPU資源的罪魁禍?zhǔn)住?梢酝ㄟ^(guò)Profiler來(lái)跟蹤存儲(chǔ)過(guò)程重編譯情況。Profiler會(huì)連同重編譯的原因也追蹤出來(lái):
SP:Recompile/SQL:StmtRecompile:前者是存儲(chǔ)過(guò)程,后者是語(yǔ)句的編譯。當(dāng)你編譯一個(gè)存儲(chǔ)過(guò)程時(shí),會(huì)產(chǎn)生這個(gè)存儲(chǔ)過(guò)程及其每個(gè)語(yǔ)句的相應(yīng)事件。但是當(dāng)存儲(chǔ)過(guò)程重編譯時(shí),只有引起重編譯的語(yǔ)句會(huì)被重編譯。更重要的數(shù)據(jù)列是SP:Recompile事件。EventSubClass數(shù)據(jù)列對(duì)于發(fā)現(xiàn)重編譯的原因是非常重要的。SP:Recompile會(huì)在存儲(chǔ)過(guò)程中觸發(fā)一次或者重編譯時(shí)觸發(fā)一次,并且不對(duì)不重編譯操作的特殊批處理不觸發(fā)。在05、08中,更加游泳的是SQL:StmtRecomile。因?yàn)檫@個(gè)計(jì)數(shù)器會(huì)在所有重編譯時(shí)觸發(fā)。重要的事件列:
-
EventClass
-
EvnetSubClass
-
ObjectID(包含該語(yǔ)句的存儲(chǔ)過(guò)程ID)
-
SPID
-
StartTime
-
SqlHandle
-
TextData
更多信息請(qǐng)查看聯(lián)機(jī)叢書。
如果你已經(jīng)有了一個(gè)跟蹤文件,可以使用下面語(yǔ)句查看所有重編譯事件:
select
spid,
StartTime,
Textdata,
EventSubclass,
ObjectID,
DatabaseID,
SQLHandle
from
fn_trace_gettable ( 'e:\recompiletrace.trc' , 1)
where
EventClass in(37,75,166)
其中:EventClass 37 = Sp:Recompile, 75 = CursorRecompile, 166 = SQL:StmtRecompile
你也可以更進(jìn)一步組織結(jié)果,把sqlhandle和ObjectID列分組或者更多的列。要查看是否存儲(chǔ)過(guò)程有過(guò)多的重編譯或者因?yàn)槟撤N原因重編譯(比如SET選項(xiàng))
Showplan XML For Query Compile:
該事件會(huì)在SQLServer編譯或者重編譯一個(gè)T-SQL語(yǔ)句時(shí)觸發(fā)。這個(gè)事件是關(guān)于語(yǔ)句編譯或重編譯的信息。包括執(zhí)行計(jì)劃、ObjectID。獲取這個(gè)時(shí)間對(duì)總體性能是有意義的,因?yàn)樗懿东@每一個(gè)編譯或重編譯。如果你在SQL Compilations/sec指針長(zhǎng)期處于高值。就要注意這個(gè)事件了。可以通過(guò)這個(gè)時(shí)間查看什么語(yǔ)句經(jīng)常重編譯。然后通過(guò)改變語(yǔ)句的參數(shù)來(lái)降低重編譯次數(shù)。
DMVs:
當(dāng)你使用sys.dm_exec_query_optimizer_info DMV是,可以看到SQLServer在優(yōu)化時(shí)間方面的一些好建議。如果你執(zhí)行該DMV兩次,你將感受到在給定時(shí)間內(nèi)用于優(yōu)化的時(shí)間:
select * from sys.dm_exec_query_optimizer_info
counter occurrence value
---------------- -------------------- ---------------------
optimizations 81 1.0
elapsed time 81 6.4547820702944486E-2
要重點(diǎn)關(guān)注elapsed time,因?yàn)楫?dāng)優(yōu)化時(shí),這個(gè)時(shí)間通常接近用于優(yōu)化的CPU時(shí)間。因?yàn)閮?yōu)化處理是非常耗費(fèi)CPU資源的。你可以看到那些編譯和重編譯操作時(shí)最好CPU資源的。
另外一個(gè)有用的DMV是:sys.dm_exec_query_stats,主要關(guān)注列有:
? Sql_handle
? Total worker time
? Plan generation number
? Statement Start Offset
特別關(guān)注Plan_generation_num,因?yàn)槭遣樵儽恢鼐幾g的次數(shù)。下面是一個(gè)查詢前25個(gè)重編譯最多的存儲(chǔ)過(guò)程信息:
select * from sys.dm_exec_query_optimizer_info
select top 25
sql_text.text,
sql_handle,
plan_generation_num,
execution_count,
dbid,
objectid
from
sys.dm_exec_query_stats a
cross apply sys.dm_exec_sql_text(sql_handle) as sql_text
where
plan_generation_num >1
order by plan_generation_num desc
解決方案:
如果你發(fā)現(xiàn)了耗時(shí)的編譯和重編譯,可以考慮以下步驟:
? 如果重編譯的發(fā)生原因是因?yàn)镾ET選項(xiàng),那么用SQLServer Profiler來(lái)抓去那個(gè)SET選項(xiàng)改變了,然后把它禁用掉(程序級(jí)別,不是數(shù)據(jù)庫(kù)級(jí)別)。最好是把這個(gè)SET操作放到數(shù)據(jù)庫(kù)連接級(jí)別,因?yàn)檫@能保證在連接的生命周期中有效。
? 臨時(shí)表的重編譯臨界值比實(shí)體表要低,如果因?yàn)榕R時(shí)表的統(tǒng)計(jì)信息改變而造成重編譯,可以把臨時(shí)表改成表變量。表變量在基數(shù)上的更改不引發(fā)重編譯。但是這種方法的缺點(diǎn)是查詢優(yōu)化器不保持跟蹤表變量的基數(shù),因?yàn)楸碜兞坎划a(chǎn)生統(tǒng)計(jì)信息和維護(hù)這些信息。這回導(dǎo)致執(zhí)行計(jì)劃性能優(yōu)化方面的不完整。你可以測(cè)試各種方法然后選擇最好的一種。
? 另外一種選項(xiàng)是使用KEEP PLAN查詢暗示(query hint)。這種設(shè)置使的臨時(shí)表和實(shí)體表具有相同臨界值。EventSubClass列顯示在臨時(shí)表上一個(gè)操作的統(tǒng)計(jì)信息改變情況。
? 為了避免統(tǒng)計(jì)信息的改變而重編譯(比如,如果執(zhí)行計(jì)劃因?yàn)閿?shù)據(jù)的統(tǒng)計(jì)信息改變而變成次優(yōu)級(jí)別時(shí))。可以特別執(zhí)行query hint。只當(dāng)這個(gè)執(zhí)行計(jì)劃需要檢查正確性時(shí)才發(fā)生重編譯(比如底層結(jié)構(gòu)更改或者執(zhí)行計(jì)劃長(zhǎng)時(shí)間沒有使用)。并且不依賴統(tǒng)計(jì)信息的改變。比如表架構(gòu)的更改、或者使用了sp_recompile存儲(chǔ)過(guò)程去標(biāo)注表。
? 關(guān)閉了自動(dòng)更新統(tǒng)計(jì)信息能防止重編譯。因?yàn)榻y(tǒng)計(jì)信息的改變也會(huì)導(dǎo)致重編譯的發(fā)生。但是注意,關(guān)閉自動(dòng)更新不是一個(gè)好的建議,因?yàn)椴樵儍?yōu)化器不在對(duì)對(duì)象上的數(shù)據(jù)更改敏感,會(huì)導(dǎo)致執(zhí)行計(jì)劃不是最優(yōu)化。只是在使用了所有其他改變都無(wú)效時(shí)的不得已手段。
? 數(shù)據(jù)庫(kù)中的對(duì)象應(yīng)該使用完全限定名,如dbo.table1,這樣避免重編譯和避免引起對(duì)象歧義。
? 為了避免重編譯,可以延遲編譯,不要交錯(cuò)使用DML和DDL或者使用有條件的DDL比如IF語(yǔ)句。
? 運(yùn)行DTA看看是否可以通過(guò)調(diào)整索引來(lái)改善編譯時(shí)間和執(zhí)行時(shí)間。
? 檢查存儲(chǔ)過(guò)程是否有WITH RECOMPILE的選項(xiàng)。或者是否在創(chuàng)建存儲(chǔ)過(guò)程時(shí)有WITH RECOMPILE選項(xiàng)。在2005、2008中如果有需要,可以在語(yǔ)句級(jí)別加上RECOMPILE提示。在語(yǔ)句級(jí)別使用這個(gè)提示可以避免整個(gè)存儲(chǔ)過(guò)程重編譯。
低效查詢計(jì)劃:
當(dāng)查詢產(chǎn)生一個(gè)執(zhí)行計(jì)劃時(shí),sqlser優(yōu)化器會(huì)嘗試選擇響應(yīng)時(shí)間最短的計(jì)劃。最快響應(yīng)時(shí)間并不一定意味著最小化I/O。或者最小化CPU,而是各種資源上的一個(gè)平衡。
某些操作與生俱來(lái)就比普通操作更占用CPU資源。如Hash操作和排序操作會(huì)分別掃描各自的輸入數(shù)據(jù)。如果在類似掃描中【預(yù)讀】,那么在緩存中的所需頁(yè)面幾乎總之可用的。因此,物理I/O的等待將最小化或者完全消失。如果這些類型的操作不再受制于物理I / O時(shí),他們傾向于出現(xiàn)在高CPU的消耗。 相比之下,具有很多索引查找的嵌套循環(huán)關(guān)聯(lián)如果需要跨越表的很多部分才能找到合適的數(shù)據(jù),那么會(huì)很快成為I/O瓶頸。
查詢優(yōu)化器會(huì)對(duì)最有意義的輸入,使用基于成本的各種方法來(lái)評(píng)估各種基數(shù)操作,以便找出最有效的查詢計(jì)劃。(即執(zhí)行計(jì)劃中的EstimateRows和EstimateExecutions屬性)。沒有明確的基數(shù)評(píng)估,那么執(zhí)行計(jì)劃是有缺陷的,并且往往這個(gè)卻是最終方案。
關(guān)于描述SQLServer優(yōu)化器所使用的統(tǒng)計(jì)信息,請(qǐng)查閱:Statistics Used by the Query Optimizer in Microsoft SQL Server 2005
偵查:
低效的執(zhí)行計(jì)劃相對(duì)來(lái)說(shuō)比較容易發(fā)現(xiàn),一個(gè)低效的執(zhí)行計(jì)劃能引起CPU的消耗上升。下面的查詢是有效標(biāo)識(shí)高消耗CPU的方法:
select
highest_cpu_queries.plan_handle,
highest_cpu_queries.total_worker_time,
q.dbid,
q.objectid,
q.number,
q.encrypted,
q.[text]
from
(select top 50
qs.plan_handle,
qs.total_worker_time
from
sys.dm_exec_query_stats qs
order by qs.total_worker_time desc) as highest_cpu_queries
cross apply sys.dm_exec_sql_text(plan_handle) as q
order by highest_cpu_queries.total_worker_time desc
另外,也可以使用sys.dm_exec_cached_plans來(lái)過(guò)濾可能引起CPU高消耗的各種操作,比如:’%Hash Match%’、’%Sort%’等。
解決方案:
如果你發(fā)現(xiàn)有不高效的執(zhí)行計(jì)劃,可以嘗試以下方法:
? 使用DTA優(yōu)化程序,檢查是否有索引問題。
? 檢查壞基數(shù)評(píng)估
? 檢查語(yǔ)句的where子句是否可以更好地篩選數(shù)據(jù)。沒有嚴(yán)格限定的查詢天生就很耗資源。
? 在表上運(yùn)行UPDATE STATISTICS檢查是否還存在問題。
? 檢查是否使用了構(gòu)造器使得優(yōu)化器不能準(zhǔn)確評(píng)估基數(shù)?考慮是否可以改寫查詢來(lái)避免這種情況?
? 如果無(wú)法修改架構(gòu)或者查詢,你可以在查詢中使用查詢提示(query hints)來(lái)限定執(zhí)行計(jì)劃。計(jì)劃向?qū)瑫r(shí)也對(duì)存儲(chǔ)過(guò)程中創(chuàng)建的特殊查詢有效。Hints能脫離優(yōu)化器對(duì)查詢計(jì)劃的潛在更改。
? SQLServer 2008 同樣提供一個(gè)叫做【計(jì)劃凍結(jié)】的新特性。允許你凍結(jié)在執(zhí)行計(jì)劃緩存中存在的特定計(jì)劃。這個(gè)選項(xiàng)類似于在執(zhí)行計(jì)劃想到中使用USE PLAN查詢提示來(lái)定義計(jì)劃。但是,它消除了使用很長(zhǎng)的命令來(lái)創(chuàng)建執(zhí)行計(jì)劃向?qū)А2⑶夷茏钚』陂L(zhǎng)于據(jù)中出現(xiàn)的用戶錯(cuò)誤。示例如下:
DECLARE @plan_handle varbinary(64);
-- Extract the query's plan_handle.
SELECT @plan_handle = plan_handle FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
WHERE text LIKE N'Some query matching criteria%';
EXECUTE sp_create_plan_guide_from_handle
@name = N'Sample_PG1',
@plan_handle = @plan_handle,
@statement_start_offset = NULL;
GO
查詢內(nèi)并行:
當(dāng)生成了執(zhí)行計(jì)劃以后,SQLServer優(yōu)化器會(huì)嘗試選擇響應(yīng)時(shí)間最短的計(jì)劃來(lái)實(shí)現(xiàn)查詢。如果查詢消耗超過(guò)了并行度花銷臨界值,查詢優(yōu)化器會(huì)嘗試生成一個(gè)計(jì)劃來(lái)做并行運(yùn)行。并行執(zhí)行計(jì)劃使用處理器的多個(gè)線程來(lái)處理查詢。最大并行度可以在服務(wù)器級(jí)別做限制,使用max degree of parallelism選項(xiàng)。在資源負(fù)載組級(jí)別或者每個(gè)查詢級(jí)別使用OPTION(MAXDOP)提示。實(shí)際執(zhí)行的并行度(實(shí)際用于并行操作的標(biāo)準(zhǔn))被延遲到執(zhí)行時(shí)才實(shí)現(xiàn)。在執(zhí)行之前,SQLServer會(huì)根據(jù)空閑程度決定可用于執(zhí)行DOP的可用調(diào)度器。在DOP被選定以后,查詢會(huì)在該并行度中執(zhí)行直到結(jié)束。并行執(zhí)行會(huì)稍微耗用更多的CPU時(shí)間,但只是持續(xù)一段很短的時(shí)間。當(dāng)沒有其他瓶頸時(shí),比如物理I/O等待等,并行計(jì)劃會(huì)完全使用100%的CPU。
一個(gè)關(guān)鍵的因素是在查詢開始執(zhí)行后,引導(dǎo)查詢使用并行計(jì)劃。但是這種情況在執(zhí)行開始后還是可以變更的。比如,如果查詢?cè)诳臻e發(fā)起,服務(wù)器可能會(huì)選擇使用使用4個(gè)處理器來(lái)并行執(zhí)行。在這些線程開始執(zhí)行,現(xiàn)有的連接可以提交其他查詢這也需要大量的CPU。在那時(shí),其他不同的線程會(huì)共享可用CPU的短時(shí)間片。這將導(dǎo)致高查詢持續(xù)時(shí)間。
使用并行計(jì)劃運(yùn)行不是一件壞事,因?yàn)樗芴峁└斓捻憫?yīng)時(shí)間。但是,對(duì)給定查詢的響應(yīng)時(shí)間需要權(quán)衡,應(yīng)該從整體去權(quán)衡。并行查詢適合批處理,并且會(huì)根據(jù)負(fù)載來(lái)選擇運(yùn)行環(huán)境。SQLServer2008對(duì)分區(qū)表查詢具有很好的擴(kuò)展性,所以SQLServer2008會(huì)在并行運(yùn)行時(shí)使用比舊版本更高的CPU數(shù)量。如果這不是你想要的,就應(yīng)該限制甚至禁用并行性。
偵查:
并行查詢所帶來(lái)的影響可以使用以下方法來(lái)偵測(cè):
性能監(jiān)視器(Performance Monitor):
重點(diǎn)關(guān)注:SQL Server:SQL Statistics – Batch Requests/sec計(jì)數(shù)器和SQL Statistics Objects。由于執(zhí)行并行查詢需要預(yù)估開銷不能超過(guò)設(shè)定的閾值(默認(rèn)為5,可以在配置里面設(shè)定),所以如果每秒服務(wù)器要處理的批處理過(guò)多,那么將只有少量的批處理以并行方式執(zhí)行。服務(wù)器通常情況下會(huì)并行運(yùn)行一些小批處理,比如100個(gè)以下。
DMVs:
從服務(wù)器運(yùn)行開始,可以使用以下查詢來(lái)監(jiān)控:
select
r.session_id,
r.request_id,
max(isnull(exec_context_id, 0)) as number_of_workers,
r.sql_handle,
r.statement_start_offset,
r.statement_end_offset,
r.plan_handle
from
sys.dm_exec_requests r
join sys.dm_os_tasks t on r.session_id = t.session_id
join sys.dm_exec_sessions s on r.session_id = s.session_id
where
s.is_user_process = 0x1
group by
r.session_id, r.request_id,
r.sql_handle, r.plan_handle,
r.statement_start_offset, r.statement_end_offset
having max(isnull(exec_context_id, 0)) > 0
在結(jié)果中,可以使用sys.dm_exec_sql_text輕易找出查詢內(nèi)容,并使用sys.dm_exec_cached_plan來(lái)查看執(zhí)行計(jì)劃。
同時(shí)可以查找正在并行運(yùn)行的執(zhí)行計(jì)劃,可以查看是否有Parallel操作符且其屬性非0的執(zhí)行計(jì)劃。這些計(jì)劃可能不是以并行方式運(yùn)行,但是他們?nèi)绻谙到y(tǒng)不是非常繁忙的時(shí)候,還是能按并行方式運(yùn)行
--
-- Find query plans that can run in parallel
--
select
p.*,
q.*,
cp.plan_handle
from
sys.dm_exec_cached_plans cp
cross apply sys.dm_exec_query_plan(cp.plan_handle) p
cross apply sys.dm_exec_sql_text(cp.plan_handle) as q
where
cp.cacheobjtype = 'Compiled Plan' and
p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";
max(//p:RelOp/@Parallel)', 'float') > 0
一般情況下,一個(gè)查詢的持續(xù)時(shí)間會(huì)比CPU時(shí)間更長(zhǎng),因?yàn)槠渲幸恍r(shí)間是花費(fèi)在等待別的資源如鎖或者物理I/O上。但在一種情況下查詢的花費(fèi)CPU時(shí)間會(huì)比持續(xù)時(shí)間更長(zhǎng),就是當(dāng)查詢以并行方式運(yùn)行在目前可用的多個(gè)線程上。但是注意不是所有的并行查詢都會(huì)以這種方式運(yùn)行。
select
qs.sql_handle,
qs.statement_start_offset,
qs.statement_end_offset,
q.dbid,
q.objectid,
q.number,
q.encrypted,
q.text
from
sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.plan_handle) as q
where
qs.total_worker_time > qs.total_elapsed_time
SQL Trace
Look for the following signs of parallel queries, which could be either
statements or batches that have CPU time greater than the duration.
select
EventClass,
TextData
from
::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where
EventClass in (10, 12) -- RPC:Completed, SQL:BatchCompleted
and CPU > Duration/1000 -- CPU is in milliseconds, Duration in
microseconds oOr can be Showplans (un-encoded) that have Parallelism
operators in them
select
EventClass,
TextData
from
::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where
TextData LIKE '%Parallelism%'
解決方法:
? 以并行計(jì)劃運(yùn)行的查詢,優(yōu)化器會(huì)確認(rèn)是否預(yù)計(jì)花銷超過(guò)了默認(rèn)閾值(5)。一些查詢會(huì)被標(biāo)記,作為候選優(yōu)化方案。
? 使用DTA查看是否有索引需要修改。改動(dòng)索引能減少查詢開銷。
? 檢查預(yù)估執(zhí)行計(jì)劃和實(shí)際執(zhí)行計(jì)劃,因?yàn)轭A(yù)估是基于統(tǒng)計(jì)信息,而且基于成本。如果下面的情況出現(xiàn)了,就要檢查問題:
o 是否禁用了自動(dòng)開啟統(tǒng)計(jì),確保在執(zhí)行計(jì)劃的警告部分不會(huì)出現(xiàn)丟失狀態(tài)等信息
o 如果預(yù)估計(jì)劃被禁用,那么在對(duì)應(yīng)表上運(yùn)行:UPDATE STATISTICS
o 確保查詢沒有使用到優(yōu)化器無(wú)法明確預(yù)估的構(gòu)造器,如多語(yǔ)句的表值函數(shù)或者CLR函數(shù)、表變量或者帶有比較功能的T-SQL函數(shù)(帶有參數(shù)的比較是允許的)。
o 評(píng)估語(yǔ)句是否能用更有效更新的方式去實(shí)現(xiàn)。
低效的游標(biāo)使用:
在2005之前的版本只支持單獨(dú)的活動(dòng)連接。正在被查詢或者等待被發(fā)送到客戶端的查詢被認(rèn)為是活動(dòng)的。在某些情況下,客戶端可能接收結(jié)果后,會(huì)把結(jié)果里面的語(yǔ)句返回SQLServer再繼續(xù)執(zhí)行,這樣會(huì)引起新一輪的掛起。一個(gè)比較好的解決辦法是改變連接屬性,在服務(wù)器端使用游標(biāo)。
當(dāng)使用服務(wù)器端的游標(biāo)時(shí),數(shù)據(jù)庫(kù)客戶端軟件(OLE DB或者ODBC)會(huì)透明地壓縮客戶端請(qǐng)求到一個(gè)特殊的擴(kuò)展存儲(chǔ)過(guò)程,如sp_cursoropen /sp_cursoffetch。這是引用一個(gè)API游標(biāo)(于T-SQL游標(biāo)相反)。當(dāng)用戶執(zhí)行查詢時(shí),查詢文本將通過(guò)sp_cursoropen被發(fā)送到服務(wù)器。請(qǐng)求將從sp_cursorfetch通過(guò)服務(wù)器返回的特定行中讀取數(shù)據(jù)。為了控制數(shù)據(jù)行的返回,ODBC或者OLE DB可以使用行緩存。這避免了服務(wù)器等待客戶端讀取它返回的所有數(shù)據(jù),這樣,服務(wù)器就可以準(zhǔn)備接收該連接的新請(qǐng)求。
應(yīng)用程序打開游標(biāo)開始讀取時(shí),容易由于網(wǎng)絡(luò)延遲造成瓶頸。特別在廣域網(wǎng)。在多用戶連接的快速網(wǎng)絡(luò),從總體上進(jìn)程請(qǐng)求很多游標(biāo)帶來(lái)的性能問題也會(huì)變得很明顯。因?yàn)榭傮w響應(yīng)時(shí)間會(huì)因?yàn)橛螛?biāo)定位每個(gè)結(jié)果集、預(yù)處理和類似處理。
偵測(cè):
你可以使用下面工具來(lái)排查低效的游標(biāo)應(yīng)用:
性能監(jiān)視器:
查看SQL Server: Cursor Manager By Type-Cursor Requests/Sec計(jì)數(shù)器。可以感受到大概游標(biāo)使用情況。系統(tǒng)可能因?yàn)樾⌒偷棵胗袔装賯€(gè)提取操作而導(dǎo)致存在高CPU利用。沒有專門的計(jì)數(shù)器列舉提取所使用的緩存大小。
DMVs:
可以使用下面的DMV來(lái)查詢每個(gè)連接的API游標(biāo)提取的緩存大小。
select
cur.*
from
sys.dm_exec_connections con
cross apply sys.dm_exec_cursors(con.session_id) as cur
where
cur.fetch_buffer_size = 1
and cur.properties LIKE 'API%' -- API cursor (Transact-SQL cursors
always
have a fetch buffer of 1)
SQL Trace:
使用包含RPC:Completed事件去查找sp_cursorfetch語(yǔ)句。第四個(gè)參數(shù)的只是提取操作所返回的行數(shù)。具體大小根據(jù)輸出而定。可以看相應(yīng)的RPC:Starting事件。
解決方法:
? 檢查是否可以使用基于集合的操作來(lái)替代游標(biāo),因?yàn)檫@樣幾乎一定比游標(biāo)高效。
? 考慮在連接SQLServer2008是,開啟多活動(dòng)結(jié)果(MARS)
? 查閱你使用的API文檔。決定如何定義一個(gè)對(duì)于游標(biāo)的大的提取緩存:
o ODBC-SQL_ATTR_ROW_ARRAY_SIZE
o OLE DB –Irowset::GetNextRows 或者IRowsetLocate::GetRowsAt
下一章:
I/O瓶頸
原文:
CPU Bottlenecks
A CPU bottleneck can be caused by hardware resources that are insufficient for the load. However, excessive CPU utilization can commonly be reduced by query tuning (especially if there was a sudden increase without additional load or different queries on the server), addressing any application design factors, and optimizing the system configuration. Before you rush out to buy faster and/or more processors, identify the largest consumers of CPU bandwidth and see whether you can tune those queries or adjust the design/configuration factors.
Performance Monitor is generally one of the easiest means to determine whether the server is CPU bound. You should look to see whether the Processor:% Processor Time counter is high; sustained values in excess of 80% of the processor time per CPU are generally deemed to be a bottleneck.
From within SQL Server, you can also check for CPU bottlenecks by checking the DMVs. Requests waiting with the SOS_SCHEDULER_YIELD wait type or a high number of runnable tasks can indicate that runnable threads are waiting to be scheduled and that there might be a CPU bottleneck on the processor. If you have enabled the data collector, the SQL Server Waits chart on the Server Activity report is a very easy way to monitor for CPU bottlenecks over time. Consumed CPU and SOS_SCHEDULER_YIELD waits are rolled up into the CPU Wait Category in this report, and if you do see high CPU utilization, you can drill through to find the queries that are consuming the most resources.
The following query gives you a high-level view of which currently cached batches or procedures are using the most CPU. The query aggregates the CPU consumed by all statements with the same plan_handle (meaning that they are part of the same batch or procedure). If a given plan_handle has more than one statement, you may have to drill in further to find the specific query that is the largest contributor to the overall CPU usage.
select top 50
sum(qs.total_worker_time) as total_cpu_time,
sum(qs.execution_count) as total_execution_count,
count(*) as number_of_statements,
qs.plan_handle
from
sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc
The remainder of this section discusses some common CPU-intensive operations that can occur with SQL Server, as well as efficient methods for detecting and resolving these problems.
Excessive Query Compilation and Optimization
Query compilation and optimization is a CPU-intensive process. The cost of optimization increases as the complexity of the query and the underlying schema increases, but even a relatively simply query can take 10-20 milliseconds of CPU time to parse and compile.
To mitigate this cost, SQL Server caches and reuses compiled query plans. Each time a new query is received from the client, SQL Server first searches the plan cache (sometimes referred to as the procedure cache) to see whether there is already a compiled plan that can be reused. If a matching query plan cannot be found, SQL Server parses and compiles the incoming query before running it.
For an OLTP-type workload, the set of queries that are submitted is relatively small and static. Quite commonly the optimal query plan does not depend on the exact value or values used as predicates in the query because the lookups are based on keys. Reusing query plans in this type of workload is very important because the cost of compilation may be as high as or higher than the cost of executing the query itself. However, a data-warehousing workload may benefit greatly from using ad hoc SQL and letting the query optimizer search for the optimal plan for each set of values, because the run time for these queries is typically much longer than the compile time, and the optimal query plan is more likely to change depending on the predicates in the query. Using parameterized queries or stored procedures for OLTP-based applications substantially increases the chance of reusing a cached plan and can result in substantial reductions in SQL Server CPU consumption. You can enable parameterization at the database or query level by using the PARAMETERIZATION FORCED database option or query hint, respectively. For more information about important limitations, especially if you rely on indexes on computed columns or indexed views, see SQL Server 2008 Books Online.
However, the best place to parameterize queries is within the application itself (at design time), which also helps mitigate the risk of SQL injection by avoiding string concatenation using parameter values. For more information, see the following topics in SQL Server 2008 Books Online:
? SQL Injection (http://msdn.microsoft.com/en-us/library/ms161953.aspx)
? Using sp_executesql (http://msdn.microsoft.com/en-us/library/ms175170.aspx)
Detection
During compilation, SQL Server 2008 computes a “signature” of the query and exposes this as the query_hash column in sys.dm_exec_requests and sys.dm_exec_query_stats, and the QueryHash attribute in Showplan/Statistics XML. Entities with the same query_hash value have a high probability of referring to the same query text if it had been written in a query_hash parameterized form. Queries that vary only in literal values should have the same value. For example, the first two queries share the same query hash, while the third query has a different query hash, because it is performing a different operation.
select * from sys.objects where object_id = 100
select * from sys.objects where object_id = 101
select * from sys.objects where name = 'sysobjects'
The query hash is computed from the tree structure produced during compilation. Whitespace is ignored, as are differences in the use of explicit column lists compared to using an asterisk (*) in the SELECT list. Furthermore, it does not matter if one query uses fully qualified name and another uses just the table name as long as they both refer to the same object. All of the following should produce the same query_hash value.
Use AdventureWorks
Go
set showplan_xml on
go
-- Assume this is run by a user whose default schema is Sales
select * from SalesOrderHeader h
select * from Sales.SalesOrderHeader h
select SalesOrderID,
RevisionNumber,
OrderDate,
DueDate,
ShipDate,
Status,
OnlineOrderFlag,
SalesOrderNumber,
PurchaseOrderNumber,
AccountNumber,
CustomerID,
ContactID,
SalesPersonID,
TerritoryID,
BillToAddressID,
ShipToAddressID,
ShipMethodID,
CreditCardID,
CreditCardApprovalCode,
CurrencyRateID,
SubTotal,
TaxAmt,
Freight,
TotalDue,
Comment,
rowguid,
ModifiedDate
from Sales.SalesOrderHeader h
go
set showplan_xml off
go
Note that the database portion of the fully qualified name is ignored when the query_hash value is generated. This allows resource usage to be aggregated across all queries in systems that replicate the same schema and queries against many databases on the same instance.
An easy way to detect applications that submit lots of ad hoc queries is by grouping on the sys.dm_exec_query_stats.query_hash column as follows.
select q.query_hash,
q.number_of_entries,
t.text as sample_query,
p.query_plan as sample_plan
from (select top 20 query_hash,
count(*) as number_of_entries,
min(sql_handle) as sample_sql_handle,
min(plan_handle) as sample_plan_handle
from sys.dm_exec_query_stats
group by query_hash
having count(*) > 1
order by count(*) desc) as q
cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go
Queries that have a number_of_entries value in the hundreds or thousands are excellent candidates for parameterization. If you look at the CompileTime and CompileCPU attributes under the <QueryPlan> tag of the sample XML query plan and multiply those values times the number_of_entries value for that query, you can get an estimate of how much compile time and CPU you can eliminate by parameterizing the query (which means that the query is compiled once, and then it is cached and reused for subsequent executions). Eliminating these unnecessary cached plans has other intangible benefits as well, such as freeing memory to cache other compiled plans (thereby further reducing compilation overhead) and leaving more memory for the buffer cache.
Resolution
SQL Server 2008 also produces a query_plan_hash value that represents a “signature” of the query plan’s access path (that is, what join algorithm is used, the join order, index selection, and so forth). Some applications might rely on getting a different query plan based on the optimizer evaluating the specific parameter values passed to that execution of the query. If that is the case, you do not want to parameterize the queries.
You can use the query_hash and query_plan_hash values together to determine whether a set of ad hoc queries with the same query_hash value resulted in query plans with the same or different query_plan_hash values, or access path. This is done via a small modification to the earlier query.
select q.query_hash,
q.number_of_entries,
q.distinct_plans,
t.text as sample_query,
p.query_plan as sample_plan
from (select top 20 query_hash,
count(*) as number_of_entries,
count(distinct query_plan_hash) as distinct_plans,
min(sql_handle) as sample_sql_handle,
min(plan_handle) as sample_plan_handle
from sys.dm_exec_query_stats
group by query_hash
having count(*) > 1
order by count(*) desc) as q
cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go
Note that this new query returns a count of the number of distinct query plans (query_plan_hash values) for a given query_hash value. Rows that return a large number for number_of_entries and a distinct_plans count of 1 are good candidates for parameterization. Even if the number of distinct plans is more than one, you can use sys.dm_exec_query_plan to retrieve the different query plans and examine them to see whether the difference is important and necessary for achieving optimal performance.
After you determine which queries should be parameterized, the best place to parameterize them is the client application. The details of how you do this vary slightly depending on which client API you use, but the one consistent thing across all of the APIs is that instead of building the query string with literal predicates, you build a string with a question mark (?) as a parameter marker.
-- Submitting as ad hoc query
select * from Sales.SalesOrderHeader where SalesOrderID = 100
-- Submitting as parameterized
select * from Sales.SalesOrderHeader where SalesOrderID = ?
You should use the appropriate APIs for your technology (ODBC, OLE DB, or SQLClient) to bind a value to the parameter marker. The client driver or provider then submits the query in its parameterized form using sp_executesql.
exec sp_executesql N’select * from Sales.SalesOrderHeader where SalesOrderID = @P1’, N’@P1 int’, 100
Because the query is parameterized, it matches and reuses an existing cached plan.
If the entire workload for a given database is appropriate for parameterization and you do not have control over (or can’t change) the client application, you can also enable the forced parameterization option for the database. Note the caveats mentioned earlier; this can prevent the optimizer from matching indexed views and indexes on computed columns.
ALTER DATABASE AdventureWorks SET PARAMETERIZATION FORCED
If you can’t parameterize the client application or enable forced parameterization for the entire database, you can still create a template plan guide for specific queries with the OPTION (PARAMETERIZATION FORCED) hint. For more information about the steps required to do this, see Forced Parameterization (http://technet.microsoft.com/en-us/library/ms175037.aspx) in SQL Server 2008 Books Online.
Unnecessary Recompilation
When a batch or remote procedure call (RPC) is submitted to SQL Server, the server checks for the validity and correctness of the query plan before it begins executing. If one of these checks fails, the batch may have to be compiled again to produce a different query plan. Such compilations are known as recompilations. These recompilations are generally necessary to ensure correctness and are often performed when the server determines that there could be a more optimal query plan due to changes in underlying data. Compilations by nature are CPU intensive and hence excessive recompilations could result in a CPU-bound performance problem on the system.
In SQL Server 2000, when SQL Server recompiles a stored procedure, the entire stored procedure is recompiled, not just the statement that triggered the recompilation. In SQL Server 2008 and SQL Server 2005, the behavior is changed to statement-level recompilation of stored procedures. When SQL Server 2008 or SQL Server 2005 recompiles stored procedures, only the statement that caused the recompilation is compiled—not the entire procedure. This uses less CPU bandwidth and results in less contention on lock resources such as COMPILE locks. Recompilation can happen in response to various conditions, such as:
? Schema changes
? Statistics changes
? Deferred compilation
? SET option changes
? Temporary table changes
? Stored procedure creation with the RECOMPILE query hint or the OPTION (RECOMPILE) query hint
Detection
You can use Performance Monitor and SQL Server Profiler to detect excessive compilation and recompilation.
Performance Monitor
The SQL Statistics object provides counters to monitor compilation and the type of requests that are sent to an instance of SQL Server. You must monitor the number of query compilations and recompilations in conjunction with the number of batches received to find out whether the compilations are contributing to high CPU use. Ideally, the ratio of SQL Recompilations/sec to Batch Requests/sec should be very low, unless users are submitting ad hoc queries.
These are the key data counters:
? SQL Server: SQL Statistics: Batch Requests/sec
? SQL Server: SQL Statistics: SQL Compilations/sec
? SQL Server: SQL Statistics: SQL Recompilations/sec
For more information, see SQL Statistics Object (http://msdn.microsoft.com/en-us/library/ms190911.aspx) in SQL Server 2008 Books Online.
SQL Server Profiler Trace
If the Performance Monitor counters indicate a high number of recompilations, the recompilations could be contributing to the high CPU consumed by SQL Server. Look at the profiler trace to find the stored procedures that are being recompiled. The SQL Server Profiler trace provides that information along with the reason for the recompilation. You can use the following events to get this information.
SP:Recompile / SQL:StmtRecompile
The SP:Recompile and the SQL:StmtRecompile event classes indicate which stored procedures and statements have been recompiled. When you compile a stored procedure, one event is generated for the stored procedure and one for each statement that is compiled. However, when a stored procedure recompiles, only the statement that caused the recompilation is recompiled. Some of the more important data columns for the SP:Recompile event class are listed here. The EventSubClass data column in particular is important for determining the reason for the recompilation. SP:Recompile is triggered once for the procedure or trigger that is recompiled and is not fired for an ad hoc batch that could likely be recompiled. In SQL Server 2008 and SQL Server 2005, it is more useful to monitor SQL:StmtRecompile, because this event class is fired when any type of batch, ad hoc, stored procedure, or trigger is recompiled.
The key data columns to look at in these events are as follows.
? EventClass
? EventSubClass
? ObjectID (represents stored procedure that contains this statement)
? SPID
? StartTime
? SqlHandle
? TextData
For more information, see SQL:StmtRecompile Event Class (http://technet.microsoft.com/en-us/library/ms179294.aspx) in SQL Server 2008 Books Online.
If you have a trace file saved, you can use the following query to see all the recompilation events that were captured in the trace.
select
spid,
StartTime,
Textdata,
EventSubclass,
ObjectID,
DatabaseID,
SQLHandle
from
fn_trace_gettable ( 'e:\recompiletrace.trc' , 1)
where
EventClass in(37,75,166)
EventClass 37 = Sp:Recompile, 75 = CursorRecompile, 166 = SQL:StmtRecompile
For more information about trace events, see sp_trace_setevent (http://msdn.microsoft.com/en-us/library/ms186265.aspx) in SQL Server 2008 Books Online.
You could further group the results from this query by the SqlHandle and ObjectID columns, or by various other columns, to see whether most of the recompilations are attributed by one stored procedure or are due to some other reason (such as a SET option that has changed).
Showplan XML For Query Compile
The Showplan XML For Query Compile event class occurs when SQL Server compiles or recompiles a Transact-SQL statement. This event has information about the statement that is being compiled or recompiled. This information includes the query plan and the object ID of the procedure in question. Capturing this event has significant performance overhead, because it is captured for each compilation or recompilation. If you see a high value for the SQL Compilations/sec counter in Performance Monitor, you should monitor this event. With this information, you can see which statements are frequently recompiled. You can use this information to change the parameters of those statements. This should reduce the number of recompilations.
DMVs
When you use the sys.dm_exec_query_optimizer_info DMV, you can get a good idea of the time SQL Server spends optimizing. If you take two snapshots of this DMV, you can get a good feel for the time that is spent optimizing in the given time period.
select * from sys.dm_exec_query_optimizer_info
counter occurrence value
---------------- -------------------- ---------------------
optimizations 81 1.0
elapsed time 81 6.4547820702944486E-2
In particular, look at the elapsed time, which is the time elapsed due to optimizations. Because the elapsed time during optimization is generally close to the CPU time that is used for the optimization (because the optimization process is very CPU bound), you can get a good measure of the extent to which the compilation and recompilation time is contributing to the high CPU use.
Another DMV that is useful for capturing this information is sys.dm_exec_query_stats.
The data columns to look at are as follows:
? Sql_handle
? Total worker time
? Plan generation number
? Statement Start Offset
For more information, see sys.dm_exec_query_stats (http://msdn.microsoft.com/en-us/library/ms189741.aspx) in SQL Server 2008 Books Online.
In particular, plan_generation_num indicates the number of times the query has recompiled. The following sample query gives you the top 25 stored procedures that have been recompiled.
select * from sys.dm_exec_query_optimizer_info
select top 25
sql_text.text,
sql_handle,
plan_generation_num,
execution_count,
dbid,
objectid
from
sys.dm_exec_query_stats a
cross apply sys.dm_exec_sql_text(sql_handle) as sql_text
where
plan_generation_num >1
order by plan_generation_num desc
For more information, see Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx) on Microsoft TechNet.
Resolution
If you detect excessive compilation and recompilation, consider the following options:
? If the recompilation occurred because a SET option changed, use SQL Server Profiler to determine which SET option changed. Avoid changing SET options within stored procedures. It is better to set them at the connection level. Ensure that SET options are not changed during the lifetime of the connection.
? Recompilation thresholds for temporary tables are lower than for normal tables. If the recompilations on a temporary table are due to statistics changes, you can change the temporary tables to table variables. A change in the cardinality of a table variable does not cause a recompilation. The drawback of this approach is that the query optimizer does not keep track of a table variable’s cardinality because statistics are not created or maintained on table variables. This can result in less optimal query plans. You can test the different options and choose the best one.
? Another option is to use the KEEP PLAN query hint. This sets the threshold of temporary tables to be the same as that of permanent tables. The EventSubclass column displays “Statistics Changed” for an operation on a temporary table.
? To avoid recompilations that are due to changes in statistics (for example, if the plan becomes suboptimal due to change in the data statistics), specify the KEEPFIXED PLAN query hint. With this option in effect, recompilations can only happen to ensure correctness (for example, when the underlying table structure has changed and the plan no longer applies) and not to respond to changes in statistics. For example, a recompilation can occur if the schema of a table that is referenced by a statement changes, or if a table is marked with the sp_recompile stored procedure.
? Turning off the automatic updates of statistics for indexes and statistics that are defined on a table or indexed view prevents recompilations that are due to statistics changes on that object. Note, however, that turning off the auto-stats feature by using this method is not usually a good idea. This is because the query optimizer is no longer sensitive to data changes in those objects and suboptimal query plans might result. Use this method only as a last resort after exhausting all other alternatives.
? Batches should have qualified object names (for example, dbo.Table1) to avoid recompilation and to avoid ambiguity between objects.
? To avoid recompilations that are due to deferred compilations, do not interleave DML and DDL or create the DDL from conditional constructs such as IF statements.
? Run Database Engine Tuning Advisor (DTA) to see whether any indexing changes improve the compile time and the execution time of the query.
? Check to see whether the stored procedure was created with the WITH RECOMPILE option or whether the RECOMPILE query hint was used. If a procedure was created with the WITH RECOMPILE option, in SQL Server 2008 or SQL Server 2005, you may be able to take advantage of a statement-level RECOMPILE hint if a particular statement within that procedure needs to be recompiled. Using this hint at the statement level avoids the necessity of recompiling the whole procedure each time it executes, while at the same time allowing the individual statement to be compiled. For more information about the RECOMPILE hint, see Query Hints (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms181714.aspx) in SQL Server 2008 Books Online.
Inefficient Query Plan
When generating an execution plan for a query, the SQL Server optimizer attempts to choose a plan that provides the fastest response time for that query. Note that the fastest response time doesn’t necessarily mean minimizing the amount of I/O that is used, nor does it necessarily mean using the least amount of CPU—it is a balance of the various resources.
Certain types of operators are more CPU-intensive than others. By their nature, the Hash operator and Sort operator scan through their respective input data. If read-ahead (prefetch) is used during such a scan, the pages are almost always available in the buffer cache before the page is needed by the operator. Thus, waits for physical I/O are minimized or eliminated. If these types of operations are no longer constrained by physical I/O, they tend to manifest themselves in high CPU consumption. By contrast, nested loop joins have many index lookups and can quickly become I/O bound if the index lookups are traversing to many different parts of the table so that the pages can’t fit into the buffer cache.
The most significant input the optimizer uses in evaluating the cost of various alternative query plans is the cardinality estimates for each operator, which you can see in the Showplan (EstimateRows and EstimateExecutions attributes). Without accurate cardinality estimates, the primary input used in optimization is flawed, and many times so is the final plan.
For an excellent white paper that describes in detail how the SQL Server optimizer uses statistics, see Statistics Used by the Query Optimizer in Microsoft SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx). The white paper discusses how the optimizer uses statistics, best practices for maintaining up-to-date statistics, and some common query design issues that can prevent accurate estimate cardinality and thus cause inefficient query plans.
Detection
Inefficient query plans are usually detected comparatively. An inefficient query plan can cause increased CPU consumption.
The following query against sys.dm_exec_query_stats is an efficient way to determine which query is using the most cumulative CPU.
select
highest_cpu_queries.plan_handle,
highest_cpu_queries.total_worker_time,
q.dbid,
q.objectid,
q.number,
q.encrypted,
q.[text]
from
(select top 50
qs.plan_handle,
qs.total_worker_time
from
sys.dm_exec_query_stats qs
order by qs.total_worker_time desc) as highest_cpu_queries
cross apply sys.dm_exec_sql_text(plan_handle) as q
order by highest_cpu_queries.total_worker_time desc
Alternatively, you can query against sys.dm_exec_cached_plans by using filters for various operators that may be CPU intensive, such as ‘%Hash Match%’, ‘%Sort%’ to look for suspects.
Resolution
Consider the following options if you detect inefficient query plans:
? Tune the query with the Database Engine Tuning Advisor to see whether it produces any index recommendations.
? Check for issues with bad cardinality estimates.
? Are the queries written so that they use the most restrictive WHERE clause that is applicable? Unrestricted queries are resource intensive by their very nature.
? Run UPDATE STATISTICS on the tables involved in the query and check to see whether the problem persists.
? Does the query use constructs for which the optimizer is unable to accurately estimate cardinality? Consider whether the query can be modified in a way so that the issue can be avoided.
? If it is not possible to modify the schema or the query, you can use the plan guide feature to specify query hints for queries that match certain text criteria. Plan guides can be created for ad hoc queries as well as queries inside a stored procedure. Hints such as OPTION (OPTIMIZE FOR) enable you to impact the cardinality estimates while leaving the optimizer its full array of potential plans. Other hints such as OPTION (FORCE ORDER) or OPTION (USE PLAN) provide you with varying degrees of control over the query plan. SQL Server 2008 offers full DML support for plan guides, which means that that they can be created for SELECT, INSERT, UPDATE, DELETE or MERGE statements.
? SQL Server 2008 also offers a new feature called plan freezing that allows you to freeze a plan exactly as it exists in the plan cache. This option is similar to creating a plan guide with the USE PLAN query hint specified. However, it eliminates the need to execute lengthy commands as required when creating a plan guides. It also minimizes the user errors with go along with those lengthy commands. For example, the simple two-statement batch presented below is all that’s needed to freeze a plan for a query that matches the specified text criteria.
DECLARE @plan_handle varbinary(64);
-- Extract the query's plan_handle.
SELECT @plan_handle = plan_handle FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
WHERE text LIKE N'Some query matching criteria%';
EXECUTE sp_create_plan_guide_from_handle
@name = N'Sample_PG1',
@plan_handle = @plan_handle,
@statement_start_offset = NULL;
GO
This statement creates a plan guide (Sample_PG1) in the sys.plan_guides table.
Intraquery Parallelism
When generating an execution plan for a query, the SQL Server optimizer attempts to choose the plan that provides the fastest response time for that query. If the query’s cost exceeds the value specified in the cost threshold for parallelism option and parallelism has not been disabled, the optimizer attempts to generate a plan that can be run in parallel. A parallel query plan uses multiple threads to process the query, with each thread distributed across the available CPUs and concurrently utilizing CPU time from each processor. The maximum degree of parallelism can be limited server-wide by using the max degree of parallelism option, on a resource workload group level, or on a per-query level by using the OPTION (MAXDOP) hint.
The decision on the actual degree of parallelism (DOP) used for execution—a measure of how many threads will do a given operation in parallel—is deferred until execution time. Before executing the query, SQL Server determines how many schedulers are underutilized and chooses a DOP for the query that fully utilizes the remaining schedulers. After a DOP is chosen, the query runs with the chosen degree of parallelism until completion. A parallel query typically uses a similar but slightly higher amount of CPU time as compared to the corresponding serial execution plan, but it does so in a shorter amount of time. As long as there are no other bottlenecks, such as waits for physical I/O, parallel plans generally should use 100% of the CPU across all of the processors.
One key factor (how idle the system is) that led to running a parallel plan can change after the query starts executing. This can change, however, after the query starts executing. For example, if a query comes in during an idle time, the server might choose to run with a parallel plan and use a DOP of four and spawn up threads on four different processors. After those threads start executing, existing connections can submit other queries that also require a lot of CPU. At that point, all the different threads will share short time slices of the available CPU, resulting in higher query duration.
Running with a parallel plan is not inherently bad and should provide the fastest response time for that query. However, the response time for a given query must be weighed against the overall throughput and responsiveness of the rest of the queries on the system. Parallel queries are generally best suited to batch processing and decision support workloads and might not be useful in a transaction processing environment.
SQL Server 2008 implemented significant scalability improvements to fully utilize available hardware with partitioned table queries. Consequently, SQL Server 2008 might use higher amounts of CPU during parallel query execution than older versions. If this is not desired, you should limit or disable parallelism.
Detection
Intraquery parallelism problems can be detected by using the following methods.
Performance Monitor
For more information, see the SQL Server:SQL Statistics – Batch Requests/sec counter and SQL Statistics Object (http://msdn.microsoft.com/en-us/library/ms190911.aspx) in SQL Server 2008 Books Online.
Because a query must have an estimated cost that exceeds the cost threshold for the parallelism configuration setting (which defaults to 5) before it is considered for a parallel plan, the more batches a server is processing per second, the less likely it is that the batches are running with parallel plans. Servers that are running many parallel queries normally have small batch requests per second (for example, values less than 100).
DMVs
From a running server, you can determine whether any active requests are running in parallel for a given session by using the following query.
select
r.session_id,
r.request_id,
max(isnull(exec_context_id, 0)) as number_of_workers,
r.sql_handle,
r.statement_start_offset,
r.statement_end_offset,
r.plan_handle
from
sys.dm_exec_requests r
join sys.dm_os_tasks t on r.session_id = t.session_id
join sys.dm_exec_sessions s on r.session_id = s.session_id
where
s.is_user_process = 0x1
group by
r.session_id, r.request_id,
r.sql_handle, r.plan_handle,
r.statement_start_offset, r.statement_end_offset
having max(isnull(exec_context_id, 0)) > 0
With this information, you can easily retrieve the text of the query by using sys.dm_exec_sql_text, and you can retrieve the plan by using sys.dm_exec_cached_plan.
You can also search for plans that are eligible to run in parallel. To do this, search the cached plans to see whether a relational operator has its Parallel attribute as a nonzero value. These plans might not run in parallel, but they can to do so if the system is not too busy.
--
-- Find query plans that can run in parallel
--
select
p.*,
q.*,
cp.plan_handle
from
sys.dm_exec_cached_plans cp
cross apply sys.dm_exec_query_plan(cp.plan_handle) p
cross apply sys.dm_exec_sql_text(cp.plan_handle) as q
where
cp.cacheobjtype = 'Compiled Plan' and
p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";
max(//p:RelOp/@Parallel)', 'float') > 0
In general, the duration of a query is longer than the amount of CPU time, because some of the time was spent waiting on resources such as a lock or physical I/O. The only scenario where a query can use more CPU time than the elapsed duration is when the query runs with a parallel plan such that multiple threads concurrently use CPU. Note that not all parallel queries demonstrate this behavior (where the CPU time is greater than the duration).
select
qs.sql_handle,
qs.statement_start_offset,
qs.statement_end_offset,
q.dbid,
q.objectid,
q.number,
q.encrypted,
q.text
from
sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.plan_handle) as q
where
qs.total_worker_time > qs.total_elapsed_time
SQL Trace
Look for the following signs of parallel queries, which could be either
statements or batches that have CPU time greater than the duration.
select
EventClass,
TextData
from
::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where
EventClass in (10, 12) -- RPC:Completed, SQL:BatchCompleted
and CPU > Duration/1000 -- CPU is in milliseconds, Duration in
microseconds oOr can be Showplans (un-encoded) that have Parallelism
operators in them
select
EventClass,
TextData
from
::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where
TextData LIKE '%Parallelism%'
Resolution
? Any query that runs with a parallel plan is one that the optimizer identifies as expensive enough to exceed the cost threshold of parallelism, which defaults to 5 (roughly a 5-second execution time on a reference computer). Any queries identified through the previous methods are candidates for further tuning.
? Use the Database Engine Tuning Advisor to see whether any indexing changes, changes to indexed views, or partitioning changes could reduce the cost of the query.
? Check for significant differences in the actual versus the estimated cardinality, because the cardinality estimates are the primary factor in estimating the cost of the query. If any significant differences are found:
o If the auto create statistics database option is disabled, make sure that there are no MISSING STATS entries in the Warnings column of the Showplan output.
o Try running UPDATE STATISTICS on the tables where the cardinality estimates are off.
o Verify that the query doesn’t use a query construct that the optimizer can’t accurately estimate, such as multistatement table-valued functions or CLR functions, table variables, or comparisons with a Transact-SQL variable (comparisons with a parameter are okay).
o Evaluate whether the query could be written in a more efficient fashion using different Transact-SQL statements or expressions.
Poor Cursor Usage
Versions of SQL Server prior to SQL Server 2005 only supported a single active common per connection. A query that was executing or had results pending to send to the client was considered active. In some situations, the client application might need to read through the results and submit other queries to SQL Server based on the row just read from the result set. This could not be done with a default result set, because it could have other pending results. A common solution was to change the connection properties to use a server-side cursor.
When a server-side cursor is used, the database client software (the OLE DB provider or ODBC driver) transparently encapsulates client requests inside special extended stored procedures, such as sp_cursoropen or sp_cursorfetch. This is referred to as an API cursor (as opposed to a Transact-SQL cursor). When the user executes the query, the query text is sent to the server via sp_cursoropen; requests to read from the result set result in a sp_cursorfetch instructing the server to send back only a certain number of rows. By controlling the number of rows that are fetched, the ODBC driver or OLE DB provider can cache the row or rows. This prevents a situation where the server is waiting for the client to read all the rows it has sent. Thus, the server is ready to accept a new request on that connection.
Applications that open cursors and fetch one row (or a small number of rows) at a time can easily become bottlenecked by the network latency, especially on a wide area network (WAN). On a fast network with many different user connections, the overhead required to process many cursor requests can become significant. Because of the overhead associated with repositioning the cursor to the appropriate location in the result set, per-request processing overhead, and similar processing, it is more efficient for the server to process a single request that returns 100 rows than to process 100 separate requests that return the same 100 rows one row at a time.
Detection
You can use the following tools to troubleshoot poor cursor usage.
Performance Monitor
By looking at the SQL Server:Cursor Manager By Type – Cursor Requests/Sec counter, you can get a general feel for how many cursors are being used on the system. Systems that have high CPU utilization because of small fetch sizes typically have hundreds of cursor requests per second. There are no specific counters that list the fetch buffer size.
DMVs
You can use following query to determine the connections with API cursors (as opposed to Transact-SQL cursors) that are using a fetch buffer size of one row. It is much more efficient to use a larger fetch buffer, such as 100 rows.
select
cur.*
from
sys.dm_exec_connections con
cross apply sys.dm_exec_cursors(con.session_id) as cur
where
cur.fetch_buffer_size = 1
and cur.properties LIKE 'API%' -- API cursor (Transact-SQL cursors
always
have a fetch buffer of 1)
SQL Trace
Use a trace that includes the RPC:Completed event class search for sp_cursorfetch statements. The value of the fourth parameter is the number of rows returned by the fetch. The maximum number of rows that are requested to be returned is specified as an input parameter in the corresponding RPC:Starting event class.
Resolution
? Determine whether cursors are the most appropriate means to accomplish the processing or whether a set-based operation, which is generally more efficient, is possible.
? Consider enabling multiple active results (MARS) when connecting to SQL Server 2008.
? Consult the appropriate documentation for your specific API to determine how to specify a larger fetch buffer size for the cursor:
o ODBC - SQL_ATTR_ROW_ARRAY_SIZE
o OLE DB – IRowset::GetNextRows or IRowsetLocate::GetRowsAt
A CPU bottleneck can be caused by hardware resources that are insufficient for the load. However, excessive CPU utilization can commonly be reduced by query tuning (especially if there was a sudden increase without additional load or different queries on the server), addressing any application design factors, and optimizing the system configuration. Before you rush out to buy faster and/or more processors, identify the largest consumers of CPU bandwidth and see whether you can tune those queries or adjust the design/configuration factors.
Performance Monitor is generally one of the easiest means to determine whether the server is CPU bound. You should look to see whether the Processor:% Processor Time counter is high; sustained values in excess of 80% of the processor time per CPU are generally deemed to be a bottleneck.
From within SQL Server, you can also check for CPU bottlenecks by checking the DMVs. Requests waiting with the SOS_SCHEDULER_YIELD wait type or a high number of runnable tasks can indicate that runnable threads are waiting to be scheduled and that there might be a CPU bottleneck on the processor. If you have enabled the data collector, the SQL Server Waits chart on the Server Activity report is a very easy way to monitor for CPU bottlenecks over time. Consumed CPU and SOS_SCHEDULER_YIELD waits are rolled up into the CPU Wait Category in this report, and if you do see high CPU utilization, you can drill through to find the queries that are consuming the most resources.
The following query gives you a high-level view of which currently cached batches or procedures are using the most CPU. The query aggregates the CPU consumed by all statements with the same plan_handle (meaning that they are part of the same batch or procedure). If a given plan_handle has more than one statement, you may have to drill in further to find the specific query that is the largest contributor to the overall CPU usage.
select top 50
sum(qs.total_worker_time) as total_cpu_time,
sum(qs.execution_count) as total_execution_count,
count(*) as number_of_statements,
qs.plan_handle
from
sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc
The remainder of this section discusses some common CPU-intensive operations that can occur with SQL Server, as well as efficient methods for detecting and resolving these problems.
Excessive Query Compilation and Optimization
Query compilation and optimization is a CPU-intensive process. The cost of optimization increases as the complexity of the query and the underlying schema increases, but even a relatively simply query can take 10-20 milliseconds of CPU time to parse and compile.
To mitigate this cost, SQL Server caches and reuses compiled query plans. Each time a new query is received from the client, SQL Server first searches the plan cache (sometimes referred to as the procedure cache) to see whether there is already a compiled plan that can be reused. If a matching query plan cannot be found, SQL Server parses and compiles the incoming query before running it.
For an OLTP-type workload, the set of queries that are submitted is relatively small and static. Quite commonly the optimal query plan does not depend on the exact value or values used as predicates in the query because the lookups are based on keys. Reusing query plans in this type of workload is very important because the cost of compilation may be as high as or higher than the cost of executing the query itself. However, a data-warehousing workload may benefit greatly from using ad hoc SQL and letting the query optimizer search for the optimal plan for each set of values, because the run time for these queries is typically much longer than the compile time, and the optimal query plan is more likely to change depending on the predicates in the query. Using parameterized queries or stored procedures for OLTP-based applications substantially increases the chance of reusing a cached plan and can result in substantial reductions in SQL Server CPU consumption. You can enable parameterization at the database or query level by using the PARAMETERIZATION FORCED database option or query hint, respectively. For more information about important limitations, especially if you rely on indexes on computed columns or indexed views, see SQL Server 2008 Books Online.
However, the best place to parameterize queries is within the application itself (at design time), which also helps mitigate the risk of SQL injection by avoiding string concatenation using parameter values. For more information, see the following topics in SQL Server 2008 Books Online:
? SQL Injection (http://msdn.microsoft.com/en-us/library/ms161953.aspx)
? Using sp_executesql (http://msdn.microsoft.com/en-us/library/ms175170.aspx)
Detection
During compilation, SQL Server 2008 computes a “signature” of the query and exposes this as the query_hash column in sys.dm_exec_requests and sys.dm_exec_query_stats, and the QueryHash attribute in Showplan/Statistics XML. Entities with the same query_hash value have a high probability of referring to the same query text if it had been written in a query_hash parameterized form. Queries that vary only in literal values should have the same value. For example, the first two queries share the same query hash, while the third query has a different query hash, because it is performing a different operation.
select * from sys.objects where object_id = 100
select * from sys.objects where object_id = 101
select * from sys.objects where name = 'sysobjects'
The query hash is computed from the tree structure produced during compilation. Whitespace is ignored, as are differences in the use of explicit column lists compared to using an asterisk (*) in the SELECT list. Furthermore, it does not matter if one query uses fully qualified name and another uses just the table name as long as they both refer to the same object. All of the following should produce the same query_hash value.
Use AdventureWorks
Go
set showplan_xml on
go
-- Assume this is run by a user whose default schema is Sales
select * from SalesOrderHeader h
select * from Sales.SalesOrderHeader h
select SalesOrderID,
RevisionNumber,
OrderDate,
DueDate,
ShipDate,
Status,
OnlineOrderFlag,
SalesOrderNumber,
PurchaseOrderNumber,
AccountNumber,
CustomerID,
ContactID,
SalesPersonID,
TerritoryID,
BillToAddressID,
ShipToAddressID,
ShipMethodID,
CreditCardID,
CreditCardApprovalCode,
CurrencyRateID,
SubTotal,
TaxAmt,
Freight,
TotalDue,
Comment,
rowguid,
ModifiedDate
from Sales.SalesOrderHeader h
go
set showplan_xml off
go
Note that the database portion of the fully qualified name is ignored when the query_hash value is generated. This allows resource usage to be aggregated across all queries in systems that replicate the same schema and queries against many databases on the same instance.
An easy way to detect applications that submit lots of ad hoc queries is by grouping on the sys.dm_exec_query_stats.query_hash column as follows.
select q.query_hash,
q.number_of_entries,
t.text as sample_query,
p.query_plan as sample_plan
from (select top 20 query_hash,
count(*) as number_of_entries,
min(sql_handle) as sample_sql_handle,
min(plan_handle) as sample_plan_handle
from sys.dm_exec_query_stats
group by query_hash
having count(*) > 1
order by count(*) desc) as q
cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go
Queries that have a number_of_entries value in the hundreds or thousands are excellent candidates for parameterization. If you look at the CompileTime and CompileCPU attributes under the <QueryPlan> tag of the sample XML query plan and multiply those values times the number_of_entries value for that query, you can get an estimate of how much compile time and CPU you can eliminate by parameterizing the query (which means that the query is compiled once, and then it is cached and reused for subsequent executions). Eliminating these unnecessary cached plans has other intangible benefits as well, such as freeing memory to cache other compiled plans (thereby further reducing compilation overhead) and leaving more memory for the buffer cache.
Resolution
SQL Server 2008 also produces a query_plan_hash value that represents a “signature” of the query plan’s access path (that is, what join algorithm is used, the join order, index selection, and so forth). Some applications might rely on getting a different query plan based on the optimizer evaluating the specific parameter values passed to that execution of the query. If that is the case, you do not want to parameterize the queries.
You can use the query_hash and query_plan_hash values together to determine whether a set of ad hoc queries with the same query_hash value resulted in query plans with the same or different query_plan_hash values, or access path. This is done via a small modification to the earlier query.
select q.query_hash,
q.number_of_entries,
q.distinct_plans,
t.text as sample_query,
p.query_plan as sample_plan
from (select top 20 query_hash,
count(*) as number_of_entries,
count(distinct query_plan_hash) as distinct_plans,
min(sql_handle) as sample_sql_handle,
min(plan_handle) as sample_plan_handle
from sys.dm_exec_query_stats
group by query_hash
having count(*) > 1
order by count(*) desc) as q
cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go
Note that this new query returns a count of the number of distinct query plans (query_plan_hash values) for a given query_hash value. Rows that return a large number for number_of_entries and a distinct_plans count of 1 are good candidates for parameterization. Even if the number of distinct plans is more than one, you can use sys.dm_exec_query_plan to retrieve the different query plans and examine them to see whether the difference is important and necessary for achieving optimal performance.
After you determine which queries should be parameterized, the best place to parameterize them is the client application. The details of how you do this vary slightly depending on which client API you use, but the one consistent thing across all of the APIs is that instead of building the query string with literal predicates, you build a string with a question mark (?) as a parameter marker.
-- Submitting as ad hoc query
select * from Sales.SalesOrderHeader where SalesOrderID = 100
-- Submitting as parameterized
select * from Sales.SalesOrderHeader where SalesOrderID = ?
You should use the appropriate APIs for your technology (ODBC, OLE DB, or SQLClient) to bind a value to the parameter marker. The client driver or provider then submits the query in its parameterized form using sp_executesql.
exec sp_executesql N’select * from Sales.SalesOrderHeader where SalesOrderID = @P1’, N’@P1 int’, 100
Because the query is parameterized, it matches and reuses an existing cached plan.
If the entire workload for a given database is appropriate for parameterization and you do not have control over (or can’t change) the client application, you can also enable the forced parameterization option for the database. Note the caveats mentioned earlier; this can prevent the optimizer from matching indexed views and indexes on computed columns.
ALTER DATABASE AdventureWorks SET PARAMETERIZATION FORCED
If you can’t parameterize the client application or enable forced parameterization for the entire database, you can still create a template plan guide for specific queries with the OPTION (PARAMETERIZATION FORCED) hint. For more information about the steps required to do this, see Forced Parameterization (http://technet.microsoft.com/en-us/library/ms175037.aspx) in SQL Server 2008 Books Online.
Unnecessary Recompilation
When a batch or remote procedure call (RPC) is submitted to SQL Server, the server checks for the validity and correctness of the query plan before it begins executing. If one of these checks fails, the batch may have to be compiled again to produce a different query plan. Such compilations are known as recompilations. These recompilations are generally necessary to ensure correctness and are often performed when the server determines that there could be a more optimal query plan due to changes in underlying data. Compilations by nature are CPU intensive and hence excessive recompilations could result in a CPU-bound performance problem on the system.
In SQL Server 2000, when SQL Server recompiles a stored procedure, the entire stored procedure is recompiled, not just the statement that triggered the recompilation. In SQL Server 2008 and SQL Server 2005, the behavior is changed to statement-level recompilation of stored procedures. When SQL Server 2008 or SQL Server 2005 recompiles stored procedures, only the statement that caused the recompilation is compiled—not the entire procedure. This uses less CPU bandwidth and results in less contention on lock resources such as COMPILE locks. Recompilation can happen in response to various conditions, such as:
? Schema changes
? Statistics changes
? Deferred compilation
? SET option changes
? Temporary table changes
? Stored procedure creation with the RECOMPILE query hint or the OPTION (RECOMPILE) query hint
Detection
You can use Performance Monitor and SQL Server Profiler to detect excessive compilation and recompilation.
Performance Monitor
The SQL Statistics object provides counters to monitor compilation and the type of requests that are sent to an instance of SQL Server. You must monitor the number of query compilations and recompilations in conjunction with the number of batches received to find out whether the compilations are contributing to high CPU use. Ideally, the ratio of SQL Recompilations/sec to Batch Requests/sec should be very low, unless users are submitting ad hoc queries.
These are the key data counters:
? SQL Server: SQL Statistics: Batch Requests/sec
? SQL Server: SQL Statistics: SQL Compilations/sec
? SQL Server: SQL Statistics: SQL Recompilations/sec
For more information, see SQL Statistics Object (http://msdn.microsoft.com/en-us/library/ms190911.aspx) in SQL Server 2008 Books Online.
SQL Server Profiler Trace
If the Performance Monitor counters indicate a high number of recompilations, the recompilations could be contributing to the high CPU consumed by SQL Server. Look at the profiler trace to find the stored procedures that are being recompiled. The SQL Server Profiler trace provides that information along with the reason for the recompilation. You can use the following events to get this information.
SP:Recompile / SQL:StmtRecompile
The SP:Recompile and the SQL:StmtRecompile event classes indicate which stored procedures and statements have been recompiled. When you compile a stored procedure, one event is generated for the stored procedure and one for each statement that is compiled. However, when a stored procedure recompiles, only the statement that caused the recompilation is recompiled. Some of the more important data columns for the SP:Recompile event class are listed here. The EventSubClass data column in particular is important for determining the reason for the recompilation. SP:Recompile is triggered once for the procedure or trigger that is recompiled and is not fired for an ad hoc batch that could likely be recompiled. In SQL Server 2008 and SQL Server 2005, it is more useful to monitor SQL:StmtRecompile, because this event class is fired when any type of batch, ad hoc, stored procedure, or trigger is recompiled.
The key data columns to look at in these events are as follows.
? EventClass
? EventSubClass
? ObjectID (represents stored procedure that contains this statement)
? SPID
? StartTime
? SqlHandle
? TextData
For more information, see SQL:StmtRecompile Event Class (http://technet.microsoft.com/en-us/library/ms179294.aspx) in SQL Server 2008 Books Online.
If you have a trace file saved, you can use the following query to see all the recompilation events that were captured in the trace.
select
spid,
StartTime,
Textdata,
EventSubclass,
ObjectID,
DatabaseID,
SQLHandle
from
fn_trace_gettable ( 'e:\recompiletrace.trc' , 1)
where
EventClass in(37,75,166)
EventClass 37 = Sp:Recompile, 75 = CursorRecompile, 166 = SQL:StmtRecompile
For more information about trace events, see sp_trace_setevent (http://msdn.microsoft.com/en-us/library/ms186265.aspx) in SQL Server 2008 Books Online.
You could further group the results from this query by the SqlHandle and ObjectID columns, or by various other columns, to see whether most of the recompilations are attributed by one stored procedure or are due to some other reason (such as a SET option that has changed).
Showplan XML For Query Compile
The Showplan XML For Query Compile event class occurs when SQL Server compiles or recompiles a Transact-SQL statement. This event has information about the statement that is being compiled or recompiled. This information includes the query plan and the object ID of the procedure in question. Capturing this event has significant performance overhead, because it is captured for each compilation or recompilation. If you see a high value for the SQL Compilations/sec counter in Performance Monitor, you should monitor this event. With this information, you can see which statements are frequently recompiled. You can use this information to change the parameters of those statements. This should reduce the number of recompilations.
DMVs
When you use the sys.dm_exec_query_optimizer_info DMV, you can get a good idea of the time SQL Server spends optimizing. If you take two snapshots of this DMV, you can get a good feel for the time that is spent optimizing in the given time period.
select * from sys.dm_exec_query_optimizer_info
counter occurrence value
---------------- -------------------- ---------------------
optimizations 81 1.0
elapsed time 81 6.4547820702944486E-2
In particular, look at the elapsed time, which is the time elapsed due to optimizations. Because the elapsed time during optimization is generally close to the CPU time that is used for the optimization (because the optimization process is very CPU bound), you can get a good measure of the extent to which the compilation and recompilation time is contributing to the high CPU use.
Another DMV that is useful for capturing this information is sys.dm_exec_query_stats.
The data columns to look at are as follows:
? Sql_handle
? Total worker time
? Plan generation number
? Statement Start Offset
For more information, see sys.dm_exec_query_stats (http://msdn.microsoft.com/en-us/library/ms189741.aspx) in SQL Server 2008 Books Online.
In particular, plan_generation_num indicates the number of times the query has recompiled. The following sample query gives you the top 25 stored procedures that have been recompiled.
select * from sys.dm_exec_query_optimizer_info
select top 25
sql_text.text,
sql_handle,
plan_generation_num,
execution_count,
dbid,
objectid
from
sys.dm_exec_query_stats a
cross apply sys.dm_exec_sql_text(sql_handle) as sql_text
where
plan_generation_num >1
order by plan_generation_num desc
For more information, see Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx) on Microsoft TechNet.
Resolution
If you detect excessive compilation and recompilation, consider the following options:
? If the recompilation occurred because a SET option changed, use SQL Server Profiler to determine which SET option changed. Avoid changing SET options within stored procedures. It is better to set them at the connection level. Ensure that SET options are not changed during the lifetime of the connection.
? Recompilation thresholds for temporary tables are lower than for normal tables. If the recompilations on a temporary table are due to statistics changes, you can change the temporary tables to table variables. A change in the cardinality of a table variable does not cause a recompilation. The drawback of this approach is that the query optimizer does not keep track of a table variable’s cardinality because statistics are not created or maintained on table variables. This can result in less optimal query plans. You can test the different options and choose the best one.
? Another option is to use the KEEP PLAN query hint. This sets the threshold of temporary tables to be the same as that of permanent tables. The EventSubclass column displays “Statistics Changed” for an operation on a temporary table.
? To avoid recompilations that are due to changes in statistics (for example, if the plan becomes suboptimal due to change in the data statistics), specify the KEEPFIXED PLAN query hint. With this option in effect, recompilations can only happen to ensure correctness (for example, when the underlying table structure has changed and the plan no longer applies) and not to respond to changes in statistics. For example, a recompilation can occur if the schema of a table that is referenced by a statement changes, or if a table is marked with the sp_recompile stored procedure.
? Turning off the automatic updates of statistics for indexes and statistics that are defined on a table or indexed view prevents recompilations that are due to statistics changes on that object. Note, however, that turning off the auto-stats feature by using this method is not usually a good idea. This is because the query optimizer is no longer sensitive to data changes in those objects and suboptimal query plans might result. Use this method only as a last resort after exhausting all other alternatives.
? Batches should have qualified object names (for example, dbo.Table1) to avoid recompilation and to avoid ambiguity between objects.
? To avoid recompilations that are due to deferred compilations, do not interleave DML and DDL or create the DDL from conditional constructs such as IF statements.
? Run Database Engine Tuning Advisor (DTA) to see whether any indexing changes improve the compile time and the execution time of the query.
? Check to see whether the stored procedure was created with the WITH RECOMPILE option or whether the RECOMPILE query hint was used. If a procedure was created with the WITH RECOMPILE option, in SQL Server 2008 or SQL Server 2005, you may be able to take advantage of a statement-level RECOMPILE hint if a particular statement within that procedure needs to be recompiled. Using this hint at the statement level avoids the necessity of recompiling the whole procedure each time it executes, while at the same time allowing the individual statement to be compiled. For more information about the RECOMPILE hint, see Query Hints (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms181714.aspx) in SQL Server 2008 Books Online.
Inefficient Query Plan
When generating an execution plan for a query, the SQL Server optimizer attempts to choose a plan that provides the fastest response time for that query. Note that the fastest response time doesn’t necessarily mean minimizing the amount of I/O that is used, nor does it necessarily mean using the least amount of CPU—it is a balance of the various resources.
Certain types of operators are more CPU-intensive than others. By their nature, the Hash operator and Sort operator scan through their respective input data. If read-ahead (prefetch) is used during such a scan, the pages are almost always available in the buffer cache before the page is needed by the operator. Thus, waits for physical I/O are minimized or eliminated. If these types of operations are no longer constrained by physical I/O, they tend to manifest themselves in high CPU consumption. By contrast, nested loop joins have many index lookups and can quickly become I/O bound if the index lookups are traversing to many different parts of the table so that the pages can’t fit into the buffer cache.
The most significant input the optimizer uses in evaluating the cost of various alternative query plans is the cardinality estimates for each operator, which you can see in the Showplan (EstimateRows and EstimateExecutions attributes). Without accurate cardinality estimates, the primary input used in optimization is flawed, and many times so is the final plan.
For an excellent white paper that describes in detail how the SQL Server optimizer uses statistics, see Statistics Used by the Query Optimizer in Microsoft SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx). The white paper discusses how the optimizer uses statistics, best practices for maintaining up-to-date statistics, and some common query design issues that can prevent accurate estimate cardinality and thus cause inefficient query plans.
Detection
Inefficient query plans are usually detected comparatively. An inefficient query plan can cause increased CPU consumption.
The following query against sys.dm_exec_query_stats is an efficient way to determine which query is using the most cumulative CPU.
select
highest_cpu_queries.plan_handle,
highest_cpu_queries.total_worker_time,
q.dbid,
q.objectid,
q.number,
q.encrypted,
q.[text]
from
(select top 50
qs.plan_handle,
qs.total_worker_time
from
sys.dm_exec_query_stats qs
order by qs.total_worker_time desc) as highest_cpu_queries
cross apply sys.dm_exec_sql_text(plan_handle) as q
order by highest_cpu_queries.total_worker_time desc
Alternatively, you can query against sys.dm_exec_cached_plans by using filters for various operators that may be CPU intensive, such as ‘%Hash Match%’, ‘%Sort%’ to look for suspects.
Resolution
Consider the following options if you detect inefficient query plans:
? Tune the query with the Database Engine Tuning Advisor to see whether it produces any index recommendations.
? Check for issues with bad cardinality estimates.
? Are the queries written so that they use the most restrictive WHERE clause that is applicable? Unrestricted queries are resource intensive by their very nature.
? Run UPDATE STATISTICS on the tables involved in the query and check to see whether the problem persists.
? Does the query use constructs for which the optimizer is unable to accurately estimate cardinality? Consider whether the query can be modified in a way so that the issue can be avoided.
? If it is not possible to modify the schema or the query, you can use the plan guide feature to specify query hints for queries that match certain text criteria. Plan guides can be created for ad hoc queries as well as queries inside a stored procedure. Hints such as OPTION (OPTIMIZE FOR) enable you to impact the cardinality estimates while leaving the optimizer its full array of potential plans. Other hints such as OPTION (FORCE ORDER) or OPTION (USE PLAN) provide you with varying degrees of control over the query plan. SQL Server 2008 offers full DML support for plan guides, which means that that they can be created for SELECT, INSERT, UPDATE, DELETE or MERGE statements.
? SQL Server 2008 also offers a new feature called plan freezing that allows you to freeze a plan exactly as it exists in the plan cache. This option is similar to creating a plan guide with the USE PLAN query hint specified. However, it eliminates the need to execute lengthy commands as required when creating a plan guides. It also minimizes the user errors with go along with those lengthy commands. For example, the simple two-statement batch presented below is all that’s needed to freeze a plan for a query that matches the specified text criteria.
DECLARE @plan_handle varbinary(64);
-- Extract the query's plan_handle.
SELECT @plan_handle = plan_handle FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
WHERE text LIKE N'Some query matching criteria%';
EXECUTE sp_create_plan_guide_from_handle
@name = N'Sample_PG1',
@plan_handle = @plan_handle,
@statement_start_offset = NULL;
GO
This statement creates a plan guide (Sample_PG1) in the sys.plan_guides table.
Intraquery Parallelism
When generating an execution plan for a query, the SQL Server optimizer attempts to choose the plan that provides the fastest response time for that query. If the query’s cost exceeds the value specified in the cost threshold for parallelism option and parallelism has not been disabled, the optimizer attempts to generate a plan that can be run in parallel. A parallel query plan uses multiple threads to process the query, with each thread distributed across the available CPUs and concurrently utilizing CPU time from each processor. The maximum degree of parallelism can be limited server-wide by using the max degree of parallelism option, on a resource workload group level, or on a per-query level by using the OPTION (MAXDOP) hint.
The decision on the actual degree of parallelism (DOP) used for execution—a measure of how many threads will do a given operation in parallel—is deferred until execution time. Before executing the query, SQL Server determines how many schedulers are underutilized and chooses a DOP for the query that fully utilizes the remaining schedulers. After a DOP is chosen, the query runs with the chosen degree of parallelism until completion. A parallel query typically uses a similar but slightly higher amount of CPU time as compared to the corresponding serial execution plan, but it does so in a shorter amount of time. As long as there are no other bottlenecks, such as waits for physical I/O, parallel plans generally should use 100% of the CPU across all of the processors.
One key factor (how idle the system is) that led to running a parallel plan can change after the query starts executing. This can change, however, after the query starts executing. For example, if a query comes in during an idle time, the server might choose to run with a parallel plan and use a DOP of four and spawn up threads on four different processors. After those threads start executing, existing connections can submit other queries that also require a lot of CPU. At that point, all the different threads will share short time slices of the available CPU, resulting in higher query duration.
Running with a parallel plan is not inherently bad and should provide the fastest response time for that query. However, the response time for a given query must be weighed against the overall throughput and responsiveness of the rest of the queries on the system. Parallel queries are generally best suited to batch processing and decision support workloads and might not be useful in a transaction processing environment.
SQL Server 2008 implemented significant scalability improvements to fully utilize available hardware with partitioned table queries. Consequently, SQL Server 2008 might use higher amounts of CPU during parallel query execution than older versions. If this is not desired, you should limit or disable parallelism.
Detection
Intraquery parallelism problems can be detected by using the following methods.
Performance Monitor
For more information, see the SQL Server:SQL Statistics – Batch Requests/sec counter and SQL Statistics Object (http://msdn.microsoft.com/en-us/library/ms190911.aspx) in SQL Server 2008 Books Online.
Because a query must have an estimated cost that exceeds the cost threshold for the parallelism configuration setting (which defaults to 5) before it is considered for a parallel plan, the more batches a server is processing per second, the less likely it is that the batches are running with parallel plans. Servers that are running many parallel queries normally have small batch requests per second (for example, values less than 100).
DMVs
From a running server, you can determine whether any active requests are running in parallel for a given session by using the following query.
select
r.session_id,
r.request_id,
max(isnull(exec_context_id, 0)) as number_of_workers,
r.sql_handle,
r.statement_start_offset,
r.statement_end_offset,
r.plan_handle
from
sys.dm_exec_requests r
join sys.dm_os_tasks t on r.session_id = t.session_id
join sys.dm_exec_sessions s on r.session_id = s.session_id
where
s.is_user_process = 0x1
group by
r.session_id, r.request_id,
r.sql_handle, r.plan_handle,
r.statement_start_offset, r.statement_end_offset
having max(isnull(exec_context_id, 0)) > 0
With this information, you can easily retrieve the text of the query by using sys.dm_exec_sql_text, and you can retrieve the plan by using sys.dm_exec_cached_plan.
You can also search for plans that are eligible to run in parallel. To do this, search the cached plans to see whether a relational operator has its Parallel attribute as a nonzero value. These plans might not run in parallel, but they can to do so if the system is not too busy.
--
-- Find query plans that can run in parallel
--
select
p.*,
q.*,
cp.plan_handle
from
sys.dm_exec_cached_plans cp
cross apply sys.dm_exec_query_plan(cp.plan_handle) p
cross apply sys.dm_exec_sql_text(cp.plan_handle) as q
where
cp.cacheobjtype = 'Compiled Plan' and
p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";
max(//p:RelOp/@Parallel)', 'float') > 0
In general, the duration of a query is longer than the amount of CPU time, because some of the time was spent waiting on resources such as a lock or physical I/O. The only scenario where a query can use more CPU time than the elapsed duration is when the query runs with a parallel plan such that multiple threads concurrently use CPU. Note that not all parallel queries demonstrate this behavior (where the CPU time is greater than the duration).
select
qs.sql_handle,
qs.statement_start_offset,
qs.statement_end_offset,
q.dbid,
q.objectid,
q.number,
q.encrypted,
q.text
from
sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.plan_handle) as q
where
qs.total_worker_time > qs.total_elapsed_time
SQL Trace
Look for the following signs of parallel queries, which could be either
statements or batches that have CPU time greater than the duration.
select
EventClass,
TextData
from
::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where
EventClass in (10, 12) -- RPC:Completed, SQL:BatchCompleted
and CPU > Duration/1000 -- CPU is in milliseconds, Duration in
microseconds oOr can be Showplans (un-encoded) that have Parallelism
operators in them
select
EventClass,
TextData
from
::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where
TextData LIKE '%Parallelism%'
Resolution
? Any query that runs with a parallel plan is one that the optimizer identifies as expensive enough to exceed the cost threshold of parallelism, which defaults to 5 (roughly a 5-second execution time on a reference computer). Any queries identified through the previous methods are candidates for further tuning.
? Use the Database Engine Tuning Advisor to see whether any indexing changes, changes to indexed views, or partitioning changes could reduce the cost of the query.
? Check for significant differences in the actual versus the estimated cardinality, because the cardinality estimates are the primary factor in estimating the cost of the query. If any significant differences are found:
o If the auto create statistics database option is disabled, make sure that there are no MISSING STATS entries in the Warnings column of the Showplan output.
o Try running UPDATE STATISTICS on the tables where the cardinality estimates are off.
o Verify that the query doesn’t use a query construct that the optimizer can’t accurately estimate, such as multistatement table-valued functions or CLR functions, table variables, or comparisons with a Transact-SQL variable (comparisons with a parameter are okay).
o Evaluate whether the query could be written in a more efficient fashion using different Transact-SQL statements or expressions.
Poor Cursor Usage
Versions of SQL Server prior to SQL Server 2005 only supported a single active common per connection. A query that was executing or had results pending to send to the client was considered active. In some situations, the client application might need to read through the results and submit other queries to SQL Server based on the row just read from the result set. This could not be done with a default result set, because it could have other pending results. A common solution was to change the connection properties to use a server-side cursor.
When a server-side cursor is used, the database client software (the OLE DB provider or ODBC driver) transparently encapsulates client requests inside special extended stored procedures, such as sp_cursoropen or sp_cursorfetch. This is referred to as an API cursor (as opposed to a Transact-SQL cursor). When the user executes the query, the query text is sent to the server via sp_cursoropen; requests to read from the result set result in a sp_cursorfetch instructing the server to send back only a certain number of rows. By controlling the number of rows that are fetched, the ODBC driver or OLE DB provider can cache the row or rows. This prevents a situation where the server is waiting for the client to read all the rows it has sent. Thus, the server is ready to accept a new request on that connection.
Applications that open cursors and fetch one row (or a small number of rows) at a time can easily become bottlenecked by the network latency, especially on a wide area network (WAN). On a fast network with many different user connections, the overhead required to process many cursor requests can become significant. Because of the overhead associated with repositioning the cursor to the appropriate location in the result set, per-request processing overhead, and similar processing, it is more efficient for the server to process a single request that returns 100 rows than to process 100 separate requests that return the same 100 rows one row at a time.
Detection
You can use the following tools to troubleshoot poor cursor usage.
Performance Monitor
By looking at the SQL Server:Cursor Manager By Type – Cursor Requests/Sec counter, you can get a general feel for how many cursors are being used on the system. Systems that have high CPU utilization because of small fetch sizes typically have hundreds of cursor requests per second. There are no specific counters that list the fetch buffer size.
DMVs
You can use following query to determine the connections with API cursors (as opposed to Transact-SQL cursors) that are using a fetch buffer size of one row. It is much more efficient to use a larger fetch buffer, such as 100 rows.
select
cur.*
from
sys.dm_exec_connections con
cross apply sys.dm_exec_cursors(con.session_id) as cur
where
cur.fetch_buffer_size = 1
and cur.properties LIKE 'API%' -- API cursor (Transact-SQL cursors
always
have a fetch buffer of 1)
SQL Trace
Use a trace that includes the RPC:Completed event class search for sp_cursorfetch statements. The value of the fourth parameter is the number of rows returned by the fetch. The maximum number of rows that are requested to be returned is specified as an input parameter in the corresponding RPC:Starting event class.
Resolution
? Determine whether cursors are the most appropriate means to accomplish the processing or whether a set-based operation, which is generally more efficient, is possible.
? Consider enabling multiple active results (MARS) when connecting to SQL Server 2008.
? Consult the appropriate documentation for your specific API to determine how to specify a larger fetch buffer size for the cursor:
o ODBC - SQL_ATTR_ROW_ARRAY_SIZE
o OLE DB – IRowset::GetNextRows or IRowsetLocate::GetRowsAt
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號(hào)聯(lián)系: 360901061
您的支持是博主寫作最大的動(dòng)力,如果您喜歡我的文章,感覺我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對(duì)您有幫助就好】元
