最重要的一篇paper:
A Top-Down method for performance analysis and counters architecture
这篇文章提出了topdown分析模型,可以视系统为一个黑盒,看出它是CPU bound还是memory bound的系统。
当然,perf stat命令也提供了相应的支持:
# perf stat --topdown^C Performance counter stats for 'system wide': retiring bad speculation frontend bound backend bound S0-D0-C0 2 22.9% 2.0% 27.9% 47.2% S0-D0-C1 2 10.7% 0.0% 0.0% 89.2% S0-D0-C2 2 10.7% 0.0% 0.0% 89.2% S0-D0-C3 2 11.6% 3.4% 39.5% 45.5% S0-D0-C4 2 8.0% 3.1% 32.1% 56.8% S0-D0-C5 2 12.9% 2.6% 37.8% 46.7% S0-D0-C6 2 16.1% 2.7% 47.5% 33.6% S0-D0-C7 2 9.8% 1.3% 32.7% 56.1% S0-D0-C8 2 10.8% 4.2% 48.0% 37.0% S0-D0-C9 2 9.1% 1.8% 30.9% 58.2%
跟可用pmutools看到更多细节:
# python3 ./toplev.py -l 2Will measure complete system.^C# 4.2-full-perf on Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz [skl/skylake]C1 BE Backend_Bound % Slots 89.2 C1 BE/Mem Backend_Bound.Memory_Bound % Slots 49.2 <== This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck...C1 BE/Core Backend_Bound.Core_Bound % Slots 39.9 This metric represents fraction of slots where Core non- memory issues were of a bottleneck...C1-T0 MUX % 14.1 PerfMon Event Multiplexing accuracy indicatorC2 BE Backend_Bound % Slots 89.1 C2 BE/Mem Backend_Bound.Memory_Bound % Slots 49.1 <==C2 BE/Core Backend_Bound.Core_Bound % Slots 40.0 C2-T0 MUX % 14.1 C1-T1 MUX % 14.2 C2-T1 MUX % 14.1 Run toplev --describe Memory_Bound^ to get more information on bottleneck for cpuAdd --nodes '!+Memory_Bound*/3,+MUX' for breakdown.Idle CPUs 0,3-10,13-19 may have been hidden. Override with --idle-threshold 100
pmutools的下载地址:
https://github.com/andikleen/pmu-tools
然后推荐2本书:
这个是大神Brendan Gregg的:
以及easyperf.net的:
这2本书各有千秋,前一本被译为“性能之巅”,里面介绍了很多有用的工具,以及提出了USE模型。这本书是性能分析领域的必读书!!之前第一版的封面是:
后一本,对处理器的微架构等有更细的描述,它的目录一定让你垂涎三尺:
这些书和论文怎么弄,我们不介绍,该付钱不付钱,不要想着搞盗版:-)
最后一本书的pdf是免费的,在这里填个email就好:
https://book.easyperf.net/perf_book
topdown的IEEE论文是付费的,但是这里有一个免费的slides:
https://pdfs.semanticscholar.org/b5e0/1ab1baa6640a39edfa06d556fabd882cdf64.pdf
大神Brendan Gregg的书的第一版是有中文版的: