Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/01/2024 23:21, John Hubbard wrote:
> On 1/10/24 09:32, Ryan Roberts wrote:
> ...
>> options:
>>    -h, --help           show this help message and exit
>>    --pid pid            Process id of the target process. Exactly one
>>                         of --pid and --cgroup must be provided.
>>    --cgroup path        Path to the target cgroup in sysfs. Iterates
>>                         over every pid in the cgroup and its children.
>>                         Get global stats by passing in the root cgroup
> 
> Hi Ryan,
> 
> Yes, this version is fairly effective at getting global stats now.
> 
> I've got some proposed minor tweaks below, and a few questions. Let me
> start with the questions:
> 
> 1) When I run this on an older 6.4.8-based kernel:
> 
>     # ./thpmaps --cgroup /sys/fs/cgroup  --cont 128K --cont 512K --cont 1M \
>             --cont 2M --cont 512M --summary
> 
> , I get this output:
> 
> file-thp-aligned-524288kB:      36175872 kB (95%)
> file-thp-partial:                 856640 kB ( 2%)
> file-cont-aligned-128kB:        37032320 kB (97%)
> file-cont-aligned-512kB:        36597760 kB (96%)
> file-cont-aligned-1024kB:       36597760 kB (96%)
> file-cont-aligned-2048kB:       36595712 kB (96%)
> file-cont-aligned-524288kB:     36175872 kB (95%)
> 
> 
> Is it true that the above is basically "normal" 512MB THP in action?

No: the "file" part of the counter name means it is file (not anon). So this is
not mTHP, which would always be anon (e.g. "anon-cont-aligned-128kB"). Based on
your follow-up mail, I would guess this is mostly hugetlb memory rather than
actual page cache memory, but they are both getting lumped into those "file" labels.

> And all of the "cont" entries are just that way because we can't
> really tell mTHP/cont apart from normal THP?

I'm not sure exectly what you are asking. The "cont" counters are counting
blocks of contiguous, naturally aligned physical memory, which are also mapped
contiguously and aligned. So a smaller --cont would always include all the
memory captured in a larger --cont. In this case, its all the *file-backed*
memory (as highighted in the label name) so nothing to do with (m)THP. But where
you have THP, --cont doesn't care what the underlying THP size is as long as its
requirements are met, so PMD-sized THPs would be included in e.g.
*anon*-cont-aligned-128kB.

Note the the "--cont" counters don't directly count memory that is PTE-mapped
with the contiguous bit set in the page table; it just counts memory that meets
the alignment, size and mapping requirements. On arm64 systems with the contpte
series, the contiguous bit would be used here, but its not a part of what's
getting measured.

> 
> 2) On an mTHP kernel with the latest patchsets (arm64, 64K page size), I
> *think* I cannot turn off mTHP. I'm still teasing apart how much of this
> is an instrumentation error, and how much is a measurement problem (with
> the test suite). And maybe I'm wrong entirely. But the "never" option
> doesn't seem to have an effect. Unless the latest version of the testsuite
> is doing something new, sigh.
> 
> $ for f in $(find /sys/kernel/mm/transparent_hugepage/ -name enabled); do echo
> "$f: $(cat $f)"; done
> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/enabled: always madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-262144kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-32768kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-16384kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-524288kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-8192kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-65536kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-131072kB/enabled: always inherit
> madvise [never]
> /sys/kernel/mm/transparent_hugepage/hugepages-4096kB/enabled: always inherit
> madvise [never]
> 
> Any quick thoughts? Don't waste any time on this, it's probably
> operator error. Just in case, though.

As per your email, you're looking at hugetlb memory (as per counter label).

I have all the information to create a hugetlb-specific set of counters, so its
not lumped in with page cache memory. You would then have counter sets of
"anon", "file" and "htlb". Would that be useful?

> 
> 
>>                         (e.g. /sys/fs/cgroup for cgroup-v2 or
>>                         /sys/fs/cgroup/pids for cgroup-v1). Exactly one
>>                         of --pid and --cgroup must be provided.
> 
> Maybe we could add "--global" to that list. That would look, in order,
> inside cgroups2 and cgroups, for a list of pids, and then run as if
> --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified.

I think actually it might be better just to make global the default when neither
--pid nor --cgroup are provided? And in this case, I'll just grab all the pids
from /proc rather than traverse the cgroup hierachy, that way it will work on
systems without cgroups. Does that work for you?

> 
> It's nicer than failing out. And it's also directly useful. I would be
> running my above command like this, instead:
> 
> # ./thpmaps --global  --cont 128K --cont 512K --cont 1M \
>             --cont 2M --cont 512M --summary
> 
> thanks,





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux