Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/01/2024 19:14, John Hubbard wrote:
> On 1/12/24 02:00, Ryan Roberts wrote:
>>> ...
>>> After spending a day or two exploring running systems with this, I'd
>>> like to suggest:
>>>
>>> 1) measure "native PMD THPs" vs. pte-mapped mTHPs. This provides a lot
>>> of information: mTHP is configured as expected, and is helping or not,
>>> etc.
>>
>> There is a difference between how a THP is mapped (PTE vs PMD) and its size. A
>> PMD-sized THP can still be mapped with PTEs. So I'd rather not completely filter
>> out PMD-sized THPs, if that's your suggestion. But we could make a distinction
> 
> It's not...
> 
>> between THPs mapped by PTE and those mapped by PMD; the kernel interface doesn't
>> directly give us this, but we can infer it from the AnonHugePages and *PmdMapped
>> stats in smaps.
> 
> Yes, that would be excellent!
> 
>>
>>>
>>> 2) Not having to list out all the mTHP sizes would be nice. Instead,
>>> just use the possible sizes from /sys/kernel/mm/transparent_hugepage/* ,
>>> unless the user specifies sizes.
>>
>> This is exactly what the tool already does. Perhaps you haven't fully understood
>> the counters that it outputs?
> 
> Oh yes, we are in perfect agreement about my not understanding these
> counters. :) I'd even expound upon that a bit: despite having a fairly
> good working understanding the mTHP implementation in the kernel;
> despite reading and re-reading  the thpmaps documentation and peeking a
> number of times at the thpmaps script; and despite poring over the
> thpmaps output, I am still having a rough time with these counters.
> Mainly because there is a set of hidden assumptions, many of which are
> revealed below.

Oh dear sorry about that. Thanks for sticking with it and helping me get it right...

> 
> But it's actually just a few key points that were missing from the
> documentation, plus the ability to clearly see the pte-mapped parts. And
> your proposed changes below look great; I've got a few more to add and
> that should finish the job.

OK good!

> 
>>
>> You *always* get the following counters (although note the tool *hides* all
> 
> Good. It was not clear that these counters were always active. The --cont
> documentation misleads the reader a bit on that matter.
> 
>> counters whose value is 0 by default - show them with --inc-empty). This example
>> is for a system with 4K base pages:
>>
>> # thpmaps --pid 1 --summary --inc-empty
>>
>> anon-thp-aligned-16kB:
>> anon-thp-aligned-32kB:
>> anon-thp-aligned-64kB:
>> anon-thp-aligned-128kB:
>> anon-thp-aligned-256kB:
>> anon-thp-aligned-512kB:
>> anon-thp-aligned-1024kB:
>> anon-thp-aligned-2048kB:
>> anon-thp-unaligned-16kB:
>> anon-thp-unaligned-32kB:
>> anon-thp-unaligned-64kB:
>> anon-thp-unaligned-128kB:
>> anon-thp-unaligned-256kB:
>> anon-thp-unaligned-512kB:
>> anon-thp-unaligned-1024kB:
>> anon-thp-unaligned-2048kB:
>> anon-thp-partial:
>> file-thp-aligned-16kB:
>> file-thp-aligned-32kB:
>> file-thp-aligned-64kB:
>> file-thp-aligned-128kB:
>> file-thp-aligned-256kB:
>> file-thp-aligned-512kB:
>> file-thp-aligned-1024kB:
>> file-thp-aligned-2048kB:
>> file-thp-unaligned-16kB:
>> file-thp-unaligned-32kB:
>> file-thp-unaligned-64kB:
>> file-thp-unaligned-128kB:
>> file-thp-unaligned-256kB:
>> file-thp-unaligned-512kB:
>> file-thp-unaligned-1024kB:
>> file-thp-unaligned-2048kB:
>> file-thp-partial:
>>
>> So you have counters for every supported THP size in the system - they will be
>> different for a 64K base page system.
>>
>> anon vs file: hopefully obvious
>>
>> aligned vs unaligned: In both cases the THP is mapped fully and contiguously. In
>> the aligned cases it is mapped so that it is naturally aligned. So a 16K THP is
> 
> I think we should use "aligned" or "aligned to <size>", and stop saying
> "naturally aligned", throughout. "Natural" adds no additional
> information, and it makes the reader wonder if there is some other
> aspect to the alignment (does natural imply PMD-mapped? etc) that they
> are unaware of.

OK. I thought "naturally aligned" was a fairly standard and well-understood
term. Google says "We call a datum naturally aligned if its address is aligned
to its size". But I'm happy to use the phrase "aligned to <size>" if that's clearer.

> 
> 
>> mapped into VA space on a 16K boundary, a 32K THP on a 32K boundary, etc.
>>
>> partial: Parts of THPs that are partially mapped into VA space.
>>
>> Note this does not draw a distinction between PMD-mapped and PTE-mapped THPs.
>> But a THP can only be PMD-mapped if it is both PMD-aligned and PMD-sized. So
>> only 2 counters can include PMD-mappings; anon-thp-aligned-2048kB and
>> file-thp-aligned-2048kB. We can filter that out by subtracting the relevant
>> smaps counters from them. I could add a --ignore-pmd-mapped flag to do that? Or
> 
> That would work but is relatively awkward, but...1
> 
>> I could rename all the existing counters to include "pte" and introduce 2 new
>> counters: anon-thp-aligned-pmd-2048kB and file-thp-aligned-pmd-2048kB?
> 
> ...this would be perfect, I think. The "pte" would help self-document, and
> separately things out allows for a clearer view into the behavior.
> 
>>
>> The --cont option will add *additional* special counters, if specified. The idea
>> here is to provide a view on what percentage of memory is getting
>> contpte-mapped. So if you provide "--cont 64K" it will give you a counter
>> showing how much memory is in 64K, naturally aligned blocks (actually 2
>> counters; file and anon). Those blocks can come from fully mapped and aligned
>> 64K THPs. But they can also come from bigger THPs - for example, if a 128K THP
>> is aligned on a 64K boundary (but not a 128K boundary), then it will provide 2
>> 64K cont blocks, but it will be counted as unaligned in
>> anon-thp-unaligned-128kB. Or if a 2M THP is partially mapped so that only it's
>> first 1M is mapped and aligned on a 64K boundary, then it will be counted in the
>> *-thp-partial counter and would add 1M to the *-cont-aligned-64kB counter.
>>
> 
> Interesting, and completely undocumented until now. Let's add this to the
> tool's help output! In fact, all of the above.

Well it already has this, which I intended to convey the same info:

  --cont size[KMG]     Adds anon and file stats for naturally aligned,
                       contiguously mapped blocks of the specified size. May be
                       issued multiple times to track multiple sized blocks.
                       Useful to infer e.g. arm64 contpte and hpa mappings. Size
                       must be a power-of-2 number of pages.

But yes, let me work up some improved documentation and send it out for your
review. The reason its a bit terse at the moment, is that I'm using Python's
ArgumentParser for the documentation, and it removes all line breaks from the
description which makes it hard to format longer form docs. Anyway, that's a bad
excuse for bad docs so I'll figure out a solution.


> 
>>
>> Sorry if I've labored the point here. But I think the only thing the tool
>> doesn't already do that you are asking for is to differentiate PTE- vs PMD-
>> mappings?
> 
> That, plus explain itself, yes. :)

Excellent! I'll post a follow up shortly.

> 
>>
>>>
>>> ...
>>>                           (e.g. /sys/fs/cgroup for cgroup-v2 or
>>>>>>                           /sys/fs/cgroup/pids for cgroup-v1). Exactly one
>>>>>>                           of --pid and --cgroup must be provided.
>>>>>
>>>>> Maybe we could add "--global" to that list. That would look, in order,
>>>>> inside cgroups2 and cgroups, for a list of pids, and then run as if
>>>>> --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified.
>>>>
>>>> I think actually it might be better just to make global the default when
>>>> neither
>>>> --pid nor --cgroup are provided? And in this case, I'll just grab all the pids
>>>> from /proc rather than traverse the cgroup hierachy, that way it will work on
>>>> systems without cgroups. Does that work for you?
>>>
>>> Yes! That was my initial idea, in fact, and after over-thinking it for
>>> a while, it turned into the above. haha :)
>>
>> OK great - implemented for v3.
>>
> 
> Sweet!
> 
> 
> thanks,





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux