On 11/01/2024 11:54, Ryan Roberts wrote: > On 10/01/2024 23:21, John Hubbard wrote: >> On 1/10/24 09:32, Ryan Roberts wrote: >> ... >>> options: >>> -h, --help show this help message and exit >>> --pid pid Process id of the target process. Exactly one >>> of --pid and --cgroup must be provided. >>> --cgroup path Path to the target cgroup in sysfs. Iterates >>> over every pid in the cgroup and its children. >>> Get global stats by passing in the root cgroup >> >> Hi Ryan, >> >> Yes, this version is fairly effective at getting global stats now. >> >> I've got some proposed minor tweaks below, and a few questions. Let me >> start with the questions: >> >> 1) When I run this on an older 6.4.8-based kernel: >> >> # ./thpmaps --cgroup /sys/fs/cgroup --cont 128K --cont 512K --cont 1M \ >> --cont 2M --cont 512M --summary >> >> , I get this output: >> >> file-thp-aligned-524288kB: 36175872 kB (95%) >> file-thp-partial: 856640 kB ( 2%) >> file-cont-aligned-128kB: 37032320 kB (97%) >> file-cont-aligned-512kB: 36597760 kB (96%) >> file-cont-aligned-1024kB: 36597760 kB (96%) >> file-cont-aligned-2048kB: 36595712 kB (96%) >> file-cont-aligned-524288kB: 36175872 kB (95%) >> >> >> Is it true that the above is basically "normal" 512MB THP in action? > > No: the "file" part of the counter name means it is file (not anon). So this is > not mTHP, which would always be anon (e.g. "anon-cont-aligned-128kB"). Based on > your follow-up mail, I would guess this is mostly hugetlb memory rather than > actual page cache memory, but they are both getting lumped into those "file" labels. > >> And all of the "cont" entries are just that way because we can't >> really tell mTHP/cont apart from normal THP? > > I'm not sure exectly what you are asking. The "cont" counters are counting > blocks of contiguous, naturally aligned physical memory, which are also mapped > contiguously and aligned. So a smaller --cont would always include all the > memory captured in a larger --cont. In this case, its all the *file-backed* > memory (as highighted in the label name) so nothing to do with (m)THP. But where > you have THP, --cont doesn't care what the underlying THP size is as long as its > requirements are met, so PMD-sized THPs would be included in e.g. > *anon*-cont-aligned-128kB. > > Note the the "--cont" counters don't directly count memory that is PTE-mapped > with the contiguous bit set in the page table; it just counts memory that meets > the alignment, size and mapping requirements. On arm64 systems with the contpte > series, the contiguous bit would be used here, but its not a part of what's > getting measured. > >> >> 2) On an mTHP kernel with the latest patchsets (arm64, 64K page size), I >> *think* I cannot turn off mTHP. I'm still teasing apart how much of this >> is an instrumentation error, and how much is a measurement problem (with >> the test suite). And maybe I'm wrong entirely. But the "never" option >> doesn't seem to have an effect. Unless the latest version of the testsuite >> is doing something new, sigh. >> >> $ for f in $(find /sys/kernel/mm/transparent_hugepage/ -name enabled); do echo >> "$f: $(cat $f)"; done >> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/enabled: always madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-262144kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-32768kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-16384kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-524288kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-8192kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-65536kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-131072kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-4096kB/enabled: always inherit >> madvise [never] >> >> Any quick thoughts? Don't waste any time on this, it's probably >> operator error. Just in case, though. > > As per your email, you're looking at hugetlb memory (as per counter label). > > I have all the information to create a hugetlb-specific set of counters, so its > not lumped in with page cache memory. You would then have counter sets of > "anon", "file" and "htlb". Would that be useful? Or I could just filter out hugetlb memory so it doesn't appear in this tool at all? That would be easier implementation-wise, and probably more in line with the original intention of the tool (it's called thpmaps, after all). > >> >> >>> (e.g. /sys/fs/cgroup for cgroup-v2 or >>> /sys/fs/cgroup/pids for cgroup-v1). Exactly one >>> of --pid and --cgroup must be provided. >> >> Maybe we could add "--global" to that list. That would look, in order, >> inside cgroups2 and cgroups, for a list of pids, and then run as if >> --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified. > > I think actually it might be better just to make global the default when neither > --pid nor --cgroup are provided? And in this case, I'll just grab all the pids > from /proc rather than traverse the cgroup hierachy, that way it will work on > systems without cgroups. Does that work for you? > >> >> It's nicer than failing out. And it's also directly useful. I would be >> running my above command like this, instead: >> >> # ./thpmaps --global --cont 128K --cont 512K --cont 1M \ >> --cont 2M --cont 512M --summary >> >> thanks, >