On 1/2/24 07:38, Ryan Roberts wrote:
With the proliferation of large folios for file-backed memory, and more recently the introduction of multi-size THP for anonymous memory, it is becoming useful to be able to see exactly how large folios are mapped into processes. For some architectures (e.g. arm64), if most memory is mapped using contpte-sized and -aligned blocks, TLB usage can be optimized so it's useful to see where these requirements are and are not being met. thpmaps is a Python utility that reads /proc/<pid>/smaps, /proc/<pid>/pagemap and /proc/kpageflags to print information about how transparent huge pages (both file and anon) are mapped to a specified process or cgroup. It aims to help users debug and optimize their workloads. In future we may wish to introduce stats directly into the kernel (e.g. smaps or similar), but for now this provides a short term solution without the need to introduce any new ABI.
...
I've found this very useful for debugging, and I know others have requested a way to check if mTHP and contpte is working, so thought this might a good short term solution until we figure out how best to add stats in the kernel?
Hi Ryan, One thing that immediately came up during some recent testing of mTHP on arm64: the pid requirement is sometimes a little awkward. I'm running tests on a machine at a time for now, inside various containers and such, and it would be nice if there were an easy way to get some numbers for the mTHPs across the whole machine. I'm not sure if that changes anything about thpmaps here. Probably this is fine as-is. But I wanted to give some initial reactions from just some quick runs: the global state would be convenient. thanks, -- John Hubbard NVIDIA