On 15/01/2024 21:30, John Hubbard wrote: > On 1/15/24 07:56, Ryan Roberts wrote: > ... >>> But yes, let me work up some improved documentation and send it out for your >>> review. The reason its a bit terse at the moment, is that I'm using Python's >>> ArgumentParser for the documentation, and it removes all line breaks from the >>> description which makes it hard to format longer form docs. Anyway, that's a bad >>> excuse for bad docs so I'll figure out a solution. >> >> Here is my proposed documentation. If you could take a look and let me know if >> it makes sense, then I'll modify the tool to conform: >> > > Looks great. One typo fix and a note, below. > >> --8<-- >> >> $ ./thpmaps --help >> >> usage: thpmaps [-h] [--pid pid | --cgroup path] [--rollup] [--cont size[KMG]] >> [--inc-smaps] [--inc-empty] [--periodic sleep_ms] >> >> Prints information about how transparent huge pages are mapped, either system- >> wide, or for a specified process or cgroup. >> >> A default set of statistics is always generated for THP mappings. However, it is > > The way this is done is sufficiently interesting to the sysadmin to say a > few words about it. Something along these lines, approximately: > > ----- > When run without options, cgroups v1 or v2 (depending on what is active > on the system) is used in order to get a listing of all user space pids. > That pid list is passed into the core script, as if the user had provided > "--pids pid1 pid2 ...". > ----- Agree with the sentiment; I'll add something similar. Although, I'm no longer using cgroups to get all the pids - I'm grabbing them from /proc. --8<-- When run with --pid, the user explicitly specifies the set of pids to scan. e.g. "--pid 10 [--pid 134 ...]". When run with --cgroup, the user passes either a v1 or v2 cgroup and all pids that belong to the cgroup subtree are scanned. When run with neither --pid nor --cgroup, the full set of pids on the system is gathered from /proc and scanned as if the user had provided "--pid 1 --pid 2 ...". --8<-- > > This reminds me that maybe a --pids options is helpful, what do you think? How about I allow --pid to be specified multiple times? That will make the parsing easier (and be consistent with the way it works for --cont): --pid 1 --pid 2 --pid 3 ... > > >> also possible to generate additional statistics for "contiguous block mappings" >> where the block size is user-defined. >> >> Statistics are maintained independently for anonymous and file-backed >> (pagecache) memory and are shown both in kB and as a percentage of either total >> anonymous or total file-backed memory as appropriate. >> >> THP Statistics >> -------------- >> >> Statistics are always generated for fully- and contiguously-mapped THPs whose >> mapping address is aligned to their size, for each <size> supported by the >> system. Separate counters describe THPs mapped by PTE vs those mapped by PMD. >> (Although note a THP can only be mapped by PMD if it is PMD-sized): >> >> - anon-thp-pte-aligned-<size>kB >> - file-thp-pte-aligned-<size>kB >> - anon-thp-pmd-aligned-<size>kB >> - file-thp-pmd-aligned-<size>kB >> >> Similarly, statistics are always generated for fully- and contiguously-mapped >> THPs whose mapping address is *not* aligned to their size, for each <size> >> supported by the system. Due to the unaligned mapping, it is impossible to map >> by PMD, so there are only PTE counters for this case: >> >> - anon-thp-pte-unaligned-<size>kB >> - file-thp-pte-unaligned-<size>kB >> >> Statistics are also always generated for mapped pages that belong to a THP but >> where the is THP is *not* fully- and contiguously- mapped. These "partial" >> mappings are all counted in the same counter regardless of the size of the THP >> that is partially mapped: >> >> - anon-thp-pte-partial >> - file-thp-pte-partial >> >> Contiguous Block Statistics >> --------------------------- >> >> An optional, additional set of statistics is generated for every contiguous >> block size specified with `--cont <size>`. These statistics show how much memory >> is mapped in contiguous blocks of <size> and also aligned to <size>. A given >> contiguous block must all belong to the same THP, but there is no requirement >> for it to be the *whole* THP. Separate counters describe contiguous blocks >> mapped by PTE vs those mapped by PMD: >> >> - anon-cont-pte-aligned-<size>kB >> - file-cont-pte-aligned-<size>kB >> - anon-cont-pmd-aligned-<size>kB >> - file-cont-pmd-aligned-<size>kB >> >> As an example, if montiroing 64K contiguous blocks (--cont 64K), there are a > > typo: "monitoring" > >> number of sources that could provide such blocks: a fully- and contiguously- >> mapped 64K THP that is aligned to a 64K boundary would provide 1 block. A fully- >> and contiguously-mapped 128K THP that is aligned to at least a 64K boundary >> would provide 2 blocks. Or a 128K THP that maps its first 100K, but contiguously >> and starting at a 64K boundary would provide 1 block. A fully- and contiguously- >> mapped 2M THP would provide 32 blocks. There are many other possible >> permutations. >> >> optional arguments: >> -h, --help show this help message and exit >> --pid pid Process id of the target process. --pid and --cgroup are >> mutually exclusive. If neither are provided, all >> processes are scanned to provide system-wide information. >> --cgroup path Path to the target cgroup in sysfs. Iterates over every >> pid in the cgroup and its children. --pid and --cgroup >> are mutually exclusive. If neither are provided, all >> processes are scanned to provide system-wide information. >> --rollup Sum the per-vma statistics to provide a summary over the >> whole system, process or cgroup. >> --cont size[KMG] Adds stats for memory that is mapped in contiguous blocks >> of <size> and also aligned to <size>. May be issued >> multiple times to track multiple sized blocks. Useful to >> infer e.g. arm64 contpte and hpa mappings. Size must be a >> power-of-2 number of pages. >> --inc-smaps Include all numerical, additive /proc/<pid>/smaps stats >> in the output. >> --inc-empty Show all statistics including those whose value is 0. >> --periodic sleep_ms Run in a loop, polling every sleep_ms milliseconds. >> >> Requires root privilege to access pagemap and kpageflags. >> >> --8<-- > > It's all looking much more understandable now, very nice. Great - thanks for the review. I'll get this straightened out and post later today. > > thanks,