Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/12/24 02:00, Ryan Roberts wrote:
...
After spending a day or two exploring running systems with this, I'd
like to suggest:

1) measure "native PMD THPs" vs. pte-mapped mTHPs. This provides a lot
of information: mTHP is configured as expected, and is helping or not,
etc.

There is a difference between how a THP is mapped (PTE vs PMD) and its size. A
PMD-sized THP can still be mapped with PTEs. So I'd rather not completely filter
out PMD-sized THPs, if that's your suggestion. But we could make a distinction

It's not...

between THPs mapped by PTE and those mapped by PMD; the kernel interface doesn't
directly give us this, but we can infer it from the AnonHugePages and *PmdMapped
stats in smaps.

Yes, that would be excellent!



2) Not having to list out all the mTHP sizes would be nice. Instead,
just use the possible sizes from /sys/kernel/mm/transparent_hugepage/* ,
unless the user specifies sizes.

This is exactly what the tool already does. Perhaps you haven't fully understood
the counters that it outputs?

Oh yes, we are in perfect agreement about my not understanding these
counters. :) I'd even expound upon that a bit: despite having a fairly
good working understanding the mTHP implementation in the kernel;
despite reading and re-reading  the thpmaps documentation and peeking a
number of times at the thpmaps script; and despite poring over the
thpmaps output, I am still having a rough time with these counters.
Mainly because there is a set of hidden assumptions, many of which are
revealed below.

But it's actually just a few key points that were missing from the
documentation, plus the ability to clearly see the pte-mapped parts. And
your proposed changes below look great; I've got a few more to add and
that should finish the job.


You *always* get the following counters (although note the tool *hides* all

Good. It was not clear that these counters were always active. The --cont
documentation misleads the reader a bit on that matter.

counters whose value is 0 by default - show them with --inc-empty). This example
is for a system with 4K base pages:

# thpmaps --pid 1 --summary --inc-empty

anon-thp-aligned-16kB:
anon-thp-aligned-32kB:
anon-thp-aligned-64kB:
anon-thp-aligned-128kB:
anon-thp-aligned-256kB:
anon-thp-aligned-512kB:
anon-thp-aligned-1024kB:
anon-thp-aligned-2048kB:
anon-thp-unaligned-16kB:
anon-thp-unaligned-32kB:
anon-thp-unaligned-64kB:
anon-thp-unaligned-128kB:
anon-thp-unaligned-256kB:
anon-thp-unaligned-512kB:
anon-thp-unaligned-1024kB:
anon-thp-unaligned-2048kB:
anon-thp-partial:
file-thp-aligned-16kB:
file-thp-aligned-32kB:
file-thp-aligned-64kB:
file-thp-aligned-128kB:
file-thp-aligned-256kB:
file-thp-aligned-512kB:
file-thp-aligned-1024kB:
file-thp-aligned-2048kB:
file-thp-unaligned-16kB:
file-thp-unaligned-32kB:
file-thp-unaligned-64kB:
file-thp-unaligned-128kB:
file-thp-unaligned-256kB:
file-thp-unaligned-512kB:
file-thp-unaligned-1024kB:
file-thp-unaligned-2048kB:
file-thp-partial:

So you have counters for every supported THP size in the system - they will be
different for a 64K base page system.

anon vs file: hopefully obvious

aligned vs unaligned: In both cases the THP is mapped fully and contiguously. In
the aligned cases it is mapped so that it is naturally aligned. So a 16K THP is

I think we should use "aligned" or "aligned to <size>", and stop saying
"naturally aligned", throughout. "Natural" adds no additional
information, and it makes the reader wonder if there is some other
aspect to the alignment (does natural imply PMD-mapped? etc) that they
are unaware of.


mapped into VA space on a 16K boundary, a 32K THP on a 32K boundary, etc.

partial: Parts of THPs that are partially mapped into VA space.

Note this does not draw a distinction between PMD-mapped and PTE-mapped THPs.
But a THP can only be PMD-mapped if it is both PMD-aligned and PMD-sized. So
only 2 counters can include PMD-mappings; anon-thp-aligned-2048kB and
file-thp-aligned-2048kB. We can filter that out by subtracting the relevant
smaps counters from them. I could add a --ignore-pmd-mapped flag to do that? Or

That would work but is relatively awkward, but...1

I could rename all the existing counters to include "pte" and introduce 2 new
counters: anon-thp-aligned-pmd-2048kB and file-thp-aligned-pmd-2048kB?

...this would be perfect, I think. The "pte" would help self-document, and
separately things out allows for a clearer view into the behavior.


The --cont option will add *additional* special counters, if specified. The idea
here is to provide a view on what percentage of memory is getting
contpte-mapped. So if you provide "--cont 64K" it will give you a counter
showing how much memory is in 64K, naturally aligned blocks (actually 2
counters; file and anon). Those blocks can come from fully mapped and aligned
64K THPs. But they can also come from bigger THPs - for example, if a 128K THP
is aligned on a 64K boundary (but not a 128K boundary), then it will provide 2
64K cont blocks, but it will be counted as unaligned in
anon-thp-unaligned-128kB. Or if a 2M THP is partially mapped so that only it's
first 1M is mapped and aligned on a 64K boundary, then it will be counted in the
*-thp-partial counter and would add 1M to the *-cont-aligned-64kB counter.


Interesting, and completely undocumented until now. Let's add this to the
tool's help output! In fact, all of the above.


Sorry if I've labored the point here. But I think the only thing the tool
doesn't already do that you are asking for is to differentiate PTE- vs PMD-
mappings?

That, plus explain itself, yes. :)



...
                          (e.g. /sys/fs/cgroup for cgroup-v2 or
                          /sys/fs/cgroup/pids for cgroup-v1). Exactly one
                          of --pid and --cgroup must be provided.

Maybe we could add "--global" to that list. That would look, in order,
inside cgroups2 and cgroups, for a list of pids, and then run as if
--cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified.

I think actually it might be better just to make global the default when neither
--pid nor --cgroup are provided? And in this case, I'll just grab all the pids
from /proc rather than traverse the cgroup hierachy, that way it will work on
systems without cgroups. Does that work for you?

Yes! That was my initial idea, in fact, and after over-thinking it for
a while, it turned into the above. haha :)

OK great - implemented for v3.


Sweet!


thanks,
--
John Hubbard
NVIDIA





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux