On 11/01/2024 18:17, John Hubbard wrote: > On 1/11/24 03:54, Ryan Roberts wrote: > ... >> I'm not sure exectly what you are asking. The "cont" counters are counting >> blocks of contiguous, naturally aligned physical memory, which are also mapped >> contiguously and aligned. So a smaller --cont would always include all the >> memory captured in a larger --cont. In this case, its all the *file-backed* >> memory (as highighted in the label name) so nothing to do with (m)THP. But where >> you have THP, --cont doesn't care what the underlying THP size is as long as its >> requirements are met, so PMD-sized THPs would be included in e.g. >> *anon*-cont-aligned-128kB. >> >> Note the the "--cont" counters don't directly count memory that is PTE-mapped >> with the contiguous bit set in the page table; it just counts memory that meets >> the alignment, size and mapping requirements. On arm64 systems with the contpte >> series, the contiguous bit would be used here, but its not a part of what's >> getting measured. >> > > The "cont" and "naturally aligned" terms are difficult here, even though > I'm familiar with the implementation. But putting on my systems > monitoring hat, these terms are not helping people as much as I'd like, > because: > > a) "Contiguous" is not really a unique situation, so measuring large pages > that are "contiguous" is confusing. All folios are contiguous, and > anything a pte points to is contiguous as well. So --cont really > throws off the user/reader. > > b) "Naturally aligned" is also tricky. Because "natural" is not explained. > Here it means NAPOT (naturally aligned power of two, I saw that in the > riscv docs). > > After spending a day or two exploring running systems with this, I'd > like to suggest: > > 1) measure "native PMD THPs" vs. pte-mapped mTHPs. This provides a lot > of information: mTHP is configured as expected, and is helping or not, > etc. There is a difference between how a THP is mapped (PTE vs PMD) and its size. A PMD-sized THP can still be mapped with PTEs. So I'd rather not completely filter out PMD-sized THPs, if that's your suggestion. But we could make a distinction between THPs mapped by PTE and those mapped by PMD; the kernel interface doesn't directly give us this, but we can infer it from the AnonHugePages and *PmdMapped stats in smaps. > > 2) Not having to list out all the mTHP sizes would be nice. Instead, > just use the possible sizes from /sys/kernel/mm/transparent_hugepage/* , > unless the user specifies sizes. This is exactly what the tool already does. Perhaps you haven't fully understood the counters that it outputs? You *always* get the following counters (although note the tool *hides* all counters whose value is 0 by default - show them with --inc-empty). This example is for a system with 4K base pages: # thpmaps --pid 1 --summary --inc-empty anon-thp-aligned-16kB: anon-thp-aligned-32kB: anon-thp-aligned-64kB: anon-thp-aligned-128kB: anon-thp-aligned-256kB: anon-thp-aligned-512kB: anon-thp-aligned-1024kB: anon-thp-aligned-2048kB: anon-thp-unaligned-16kB: anon-thp-unaligned-32kB: anon-thp-unaligned-64kB: anon-thp-unaligned-128kB: anon-thp-unaligned-256kB: anon-thp-unaligned-512kB: anon-thp-unaligned-1024kB: anon-thp-unaligned-2048kB: anon-thp-partial: file-thp-aligned-16kB: file-thp-aligned-32kB: file-thp-aligned-64kB: file-thp-aligned-128kB: file-thp-aligned-256kB: file-thp-aligned-512kB: file-thp-aligned-1024kB: file-thp-aligned-2048kB: file-thp-unaligned-16kB: file-thp-unaligned-32kB: file-thp-unaligned-64kB: file-thp-unaligned-128kB: file-thp-unaligned-256kB: file-thp-unaligned-512kB: file-thp-unaligned-1024kB: file-thp-unaligned-2048kB: file-thp-partial: So you have counters for every supported THP size in the system - they will be different for a 64K base page system. anon vs file: hopefully obvious aligned vs unaligned: In both cases the THP is mapped fully and contiguously. In the aligned cases it is mapped so that it is naturally aligned. So a 16K THP is mapped into VA space on a 16K boundary, a 32K THP on a 32K boundary, etc. partial: Parts of THPs that are partially mapped into VA space. Note this does not draw a distinction between PMD-mapped and PTE-mapped THPs. But a THP can only be PMD-mapped if it is both PMD-aligned and PMD-sized. So only 2 counters can include PMD-mappings; anon-thp-aligned-2048kB and file-thp-aligned-2048kB. We can filter that out by subtracting the relevant smaps counters from them. I could add a --ignore-pmd-mapped flag to do that? Or I could rename all the existing counters to include "pte" and introduce 2 new counters: anon-thp-aligned-pmd-2048kB and file-thp-aligned-pmd-2048kB? The --cont option will add *additional* special counters, if specified. The idea here is to provide a view on what percentage of memory is getting contpte-mapped. So if you provide "--cont 64K" it will give you a counter showing how much memory is in 64K, naturally aligned blocks (actually 2 counters; file and anon). Those blocks can come from fully mapped and aligned 64K THPs. But they can also come from bigger THPs - for example, if a 128K THP is aligned on a 64K boundary (but not a 128K boundary), then it will provide 2 64K cont blocks, but it will be counted as unaligned in anon-thp-unaligned-128kB. Or if a 2M THP is partially mapped so that only it's first 1M is mapped and aligned on a 64K boundary, then it will be counted in the *-thp-partial counter and would add 1M to the *-cont-aligned-64kB counter. Sorry if I've labored the point here. But I think the only thing the tool doesn't already do that you are asking for is to differentiate PTE- vs PMD- mappings? > > ... > (e.g. /sys/fs/cgroup for cgroup-v2 or >>>> /sys/fs/cgroup/pids for cgroup-v1). Exactly one >>>> of --pid and --cgroup must be provided. >>> >>> Maybe we could add "--global" to that list. That would look, in order, >>> inside cgroups2 and cgroups, for a list of pids, and then run as if >>> --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified. >> >> I think actually it might be better just to make global the default when neither >> --pid nor --cgroup are provided? And in this case, I'll just grab all the pids >> from /proc rather than traverse the cgroup hierachy, that way it will work on >> systems without cgroups. Does that work for you? > > Yes! That was my initial idea, in fact, and after over-thinking it for > a while, it turned into the above. haha :) OK great - implemented for v3. > > > thanks,