On Mon, Sep 30, 2019 at 11:19:44PM -0400, Yafang Shao wrote: > A new perf script page-reclaim is introduced in this patch. This new script > is used to report the page reclaim details. The possible usage of this > script is as bellow, > - identify latency spike caused by direct reclaim > - whehter the latency spike is relevant with pageout > - why is page reclaim requested, i.e. whether it is because of memory > fragmentation > - page reclaim efficiency > etc > In the future we may also enhance it to analyze the memcg reclaim. > Hi, I ended up not reviewing this patch in detail simply because I would approach the same class of problem in an entirely different way today. There is value in accumulating the stats in a report like this; > $ perf script report page-reclaim > Direct reclaims: 4924 > Direct latency (ms) total max avg min > 177823.211 6378.977 36.114 0.051 > Direct file reclaimed 22920 > Direct file scanned 28306 > Direct file sync write I/O 0 > Direct file async write I/O 0 > Direct anon reclaimed 212567 > Direct anon scanned 1446854 > Direct anon sync write I/O 0 > Direct anon async write I/O 278325 > Direct order 0 1 3 > 4870 23 31 > Wake kswapd requests 716 > Wake order 0 1 > 715 1 > > Kswapd reclaims: 9 However, the basic option I would prefer is having the raw latency information for Direct latency that can be externally parsed by R or any other statistical method. The reason why is because knowing the max latency is not enough, I'd want to know the spread of latencies and whether they were clustered at a point of time or spread out over long periods of time. I would then build the higher-level reports on top if necessary. Today, I would also have considered getting the latency figures using eBPF or systemtap instead although having perf do it may be useful too. That's not universally popular though so at minimum I would have; perf script record page-reclaim -- capture all page-reclaim tracepoints perf script report page-reclaim -- For reclaim entry/exit, merge the two tracepoints into one that reports latency. Dump the rest out verbatim For latencies, I would externally post-process them until such time as I found a common class of bug that needed a high-level report and then build the perf script support for it. Please note that I did not spot anything wrong with your script, it's just that I would not use it myself in its current format for debugging a reclaim-related problem. -- Mel Gorman SUSE Labs