Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15/01/2024 09:48, Ryan Roberts wrote:
> On 12/01/2024 19:14, John Hubbard wrote:
>> On 1/12/24 02:00, Ryan Roberts wrote:
>>>> ...
>>>> After spending a day or two exploring running systems with this, I'd
>>>> like to suggest:
>>>>
>>>> 1) measure "native PMD THPs" vs. pte-mapped mTHPs. This provides a lot
>>>> of information: mTHP is configured as expected, and is helping or not,
>>>> etc.
>>>
>>> There is a difference between how a THP is mapped (PTE vs PMD) and its size. A
>>> PMD-sized THP can still be mapped with PTEs. So I'd rather not completely filter
>>> out PMD-sized THPs, if that's your suggestion. But we could make a distinction
>>
>> It's not...
>>
>>> between THPs mapped by PTE and those mapped by PMD; the kernel interface doesn't
>>> directly give us this, but we can infer it from the AnonHugePages and *PmdMapped
>>> stats in smaps.
>>
>> Yes, that would be excellent!
>>
>>>
>>>>
>>>> 2) Not having to list out all the mTHP sizes would be nice. Instead,
>>>> just use the possible sizes from /sys/kernel/mm/transparent_hugepage/* ,
>>>> unless the user specifies sizes.
>>>
>>> This is exactly what the tool already does. Perhaps you haven't fully understood
>>> the counters that it outputs?
>>
>> Oh yes, we are in perfect agreement about my not understanding these
>> counters. :) I'd even expound upon that a bit: despite having a fairly
>> good working understanding the mTHP implementation in the kernel;
>> despite reading and re-reading  the thpmaps documentation and peeking a
>> number of times at the thpmaps script; and despite poring over the
>> thpmaps output, I am still having a rough time with these counters.
>> Mainly because there is a set of hidden assumptions, many of which are
>> revealed below.
> 
> Oh dear sorry about that. Thanks for sticking with it and helping me get it right...
> 
>>
>> But it's actually just a few key points that were missing from the
>> documentation, plus the ability to clearly see the pte-mapped parts. And
>> your proposed changes below look great; I've got a few more to add and
>> that should finish the job.
> 
> OK good!
> 
>>
>>>
>>> You *always* get the following counters (although note the tool *hides* all
>>
>> Good. It was not clear that these counters were always active. The --cont
>> documentation misleads the reader a bit on that matter.
>>
>>> counters whose value is 0 by default - show them with --inc-empty). This example
>>> is for a system with 4K base pages:
>>>
>>> # thpmaps --pid 1 --summary --inc-empty
>>>
>>> anon-thp-aligned-16kB:
>>> anon-thp-aligned-32kB:
>>> anon-thp-aligned-64kB:
>>> anon-thp-aligned-128kB:
>>> anon-thp-aligned-256kB:
>>> anon-thp-aligned-512kB:
>>> anon-thp-aligned-1024kB:
>>> anon-thp-aligned-2048kB:
>>> anon-thp-unaligned-16kB:
>>> anon-thp-unaligned-32kB:
>>> anon-thp-unaligned-64kB:
>>> anon-thp-unaligned-128kB:
>>> anon-thp-unaligned-256kB:
>>> anon-thp-unaligned-512kB:
>>> anon-thp-unaligned-1024kB:
>>> anon-thp-unaligned-2048kB:
>>> anon-thp-partial:
>>> file-thp-aligned-16kB:
>>> file-thp-aligned-32kB:
>>> file-thp-aligned-64kB:
>>> file-thp-aligned-128kB:
>>> file-thp-aligned-256kB:
>>> file-thp-aligned-512kB:
>>> file-thp-aligned-1024kB:
>>> file-thp-aligned-2048kB:
>>> file-thp-unaligned-16kB:
>>> file-thp-unaligned-32kB:
>>> file-thp-unaligned-64kB:
>>> file-thp-unaligned-128kB:
>>> file-thp-unaligned-256kB:
>>> file-thp-unaligned-512kB:
>>> file-thp-unaligned-1024kB:
>>> file-thp-unaligned-2048kB:
>>> file-thp-partial:
>>>
>>> So you have counters for every supported THP size in the system - they will be
>>> different for a 64K base page system.
>>>
>>> anon vs file: hopefully obvious
>>>
>>> aligned vs unaligned: In both cases the THP is mapped fully and contiguously. In
>>> the aligned cases it is mapped so that it is naturally aligned. So a 16K THP is
>>
>> I think we should use "aligned" or "aligned to <size>", and stop saying
>> "naturally aligned", throughout. "Natural" adds no additional
>> information, and it makes the reader wonder if there is some other
>> aspect to the alignment (does natural imply PMD-mapped? etc) that they
>> are unaware of.
> 
> OK. I thought "naturally aligned" was a fairly standard and well-understood
> term. Google says "We call a datum naturally aligned if its address is aligned
> to its size". But I'm happy to use the phrase "aligned to <size>" if that's clearer.
> 
>>
>>
>>> mapped into VA space on a 16K boundary, a 32K THP on a 32K boundary, etc.
>>>
>>> partial: Parts of THPs that are partially mapped into VA space.
>>>
>>> Note this does not draw a distinction between PMD-mapped and PTE-mapped THPs.
>>> But a THP can only be PMD-mapped if it is both PMD-aligned and PMD-sized. So
>>> only 2 counters can include PMD-mappings; anon-thp-aligned-2048kB and
>>> file-thp-aligned-2048kB. We can filter that out by subtracting the relevant
>>> smaps counters from them. I could add a --ignore-pmd-mapped flag to do that? Or
>>
>> That would work but is relatively awkward, but...1
>>
>>> I could rename all the existing counters to include "pte" and introduce 2 new
>>> counters: anon-thp-aligned-pmd-2048kB and file-thp-aligned-pmd-2048kB?
>>
>> ...this would be perfect, I think. The "pte" would help self-document, and
>> separately things out allows for a clearer view into the behavior.
>>
>>>
>>> The --cont option will add *additional* special counters, if specified. The idea
>>> here is to provide a view on what percentage of memory is getting
>>> contpte-mapped. So if you provide "--cont 64K" it will give you a counter
>>> showing how much memory is in 64K, naturally aligned blocks (actually 2
>>> counters; file and anon). Those blocks can come from fully mapped and aligned
>>> 64K THPs. But they can also come from bigger THPs - for example, if a 128K THP
>>> is aligned on a 64K boundary (but not a 128K boundary), then it will provide 2
>>> 64K cont blocks, but it will be counted as unaligned in
>>> anon-thp-unaligned-128kB. Or if a 2M THP is partially mapped so that only it's
>>> first 1M is mapped and aligned on a 64K boundary, then it will be counted in the
>>> *-thp-partial counter and would add 1M to the *-cont-aligned-64kB counter.
>>>
>>
>> Interesting, and completely undocumented until now. Let's add this to the
>> tool's help output! In fact, all of the above.
> 
> Well it already has this, which I intended to convey the same info:
> 
>   --cont size[KMG]     Adds anon and file stats for naturally aligned,
>                        contiguously mapped blocks of the specified size. May be
>                        issued multiple times to track multiple sized blocks.
>                        Useful to infer e.g. arm64 contpte and hpa mappings. Size
>                        must be a power-of-2 number of pages.
> 
> But yes, let me work up some improved documentation and send it out for your
> review. The reason its a bit terse at the moment, is that I'm using Python's
> ArgumentParser for the documentation, and it removes all line breaks from the
> description which makes it hard to format longer form docs. Anyway, that's a bad
> excuse for bad docs so I'll figure out a solution.

Here is my proposed documentation. If you could take a look and let me know if
it makes sense, then I'll modify the tool to conform:

--8<--

$ ./thpmaps --help

usage: thpmaps [-h] [--pid pid | --cgroup path] [--rollup] [--cont size[KMG]]
               [--inc-smaps] [--inc-empty] [--periodic sleep_ms]

Prints information about how transparent huge pages are mapped, either system-
wide, or for a specified process or cgroup.

A default set of statistics is always generated for THP mappings. However, it is
also possible to generate additional statistics for "contiguous block mappings"
where the block size is user-defined.

Statistics are maintained independently for anonymous and file-backed
(pagecache) memory and are shown both in kB and as a percentage of either total
anonymous or total file-backed memory as appropriate.

THP Statistics
--------------

Statistics are always generated for fully- and contiguously-mapped THPs whose
mapping address is aligned to their size, for each <size> supported by the
system. Separate counters describe THPs mapped by PTE vs those mapped by PMD.
(Although note a THP can only be mapped by PMD if it is PMD-sized):

- anon-thp-pte-aligned-<size>kB
- file-thp-pte-aligned-<size>kB
- anon-thp-pmd-aligned-<size>kB
- file-thp-pmd-aligned-<size>kB

Similarly, statistics are always generated for fully- and contiguously-mapped
THPs whose mapping address is *not* aligned to their size, for each <size>
supported by the system. Due to the unaligned mapping, it is impossible to map
by PMD, so there are only PTE counters for this case:

- anon-thp-pte-unaligned-<size>kB
- file-thp-pte-unaligned-<size>kB

Statistics are also always generated for mapped pages that belong to a THP but
where the is THP is *not* fully- and contiguously- mapped. These "partial"
mappings are all counted in the same counter regardless of the size of the THP
that is partially mapped:

- anon-thp-pte-partial
- file-thp-pte-partial

Contiguous Block Statistics
---------------------------

An optional, additional set of statistics is generated for every contiguous
block size specified with `--cont <size>`. These statistics show how much memory
is mapped in contiguous blocks of <size> and also aligned to <size>. A given
contiguous block must all belong to the same THP, but there is no requirement
for it to be the *whole* THP. Separate counters describe contiguous blocks
mapped by PTE vs those mapped by PMD:

- anon-cont-pte-aligned-<size>kB
- file-cont-pte-aligned-<size>kB
- anon-cont-pmd-aligned-<size>kB
- file-cont-pmd-aligned-<size>kB

As an example, if montiroing 64K contiguous blocks (--cont 64K), there are a
number of sources that could provide such blocks: a fully- and contiguously-
mapped 64K THP that is aligned to a 64K boundary would provide 1 block. A fully-
and contiguously-mapped 128K THP that is aligned to at least a 64K boundary
would provide 2 blocks. Or a 128K THP that maps its first 100K, but contiguously
and starting at a 64K boundary would provide 1 block. A fully- and contiguously-
mapped 2M THP would provide 32 blocks. There are many other possible
permutations.

optional arguments:
  -h, --help           show this help message and exit
  --pid pid            Process id of the target process. --pid and --cgroup are
                       mutually exclusive. If neither are provided, all
                       processes are scanned to provide system-wide information.
  --cgroup path        Path to the target cgroup in sysfs. Iterates over every
                       pid in the cgroup and its children. --pid and --cgroup
                       are mutually exclusive. If neither are provided, all
                       processes are scanned to provide system-wide information.
  --rollup             Sum the per-vma statistics to provide a summary over the
                       whole system, process or cgroup.
  --cont size[KMG]     Adds stats for memory that is mapped in contiguous blocks
                       of <size> and also aligned to <size>. May be issued
                       multiple times to track multiple sized blocks. Useful to
                       infer e.g. arm64 contpte and hpa mappings. Size must be a
                       power-of-2 number of pages.
  --inc-smaps          Include all numerical, additive /proc/<pid>/smaps stats
                       in the output.
  --inc-empty          Show all statistics including those whose value is 0.
  --periodic sleep_ms  Run in a loop, polling every sleep_ms milliseconds.

Requires root privilege to access pagemap and kpageflags.

--8<--

Thanks,
Ryan

> 
> 
>>
>>>
>>> Sorry if I've labored the point here. But I think the only thing the tool
>>> doesn't already do that you are asking for is to differentiate PTE- vs PMD-
>>> mappings?
>>
>> That, plus explain itself, yes. :)
> 
> Excellent! I'll post a follow up shortly.
> 
>>
>>>
>>>>
>>>> ...
>>>>                           (e.g. /sys/fs/cgroup for cgroup-v2 or
>>>>>>>                           /sys/fs/cgroup/pids for cgroup-v1). Exactly one
>>>>>>>                           of --pid and --cgroup must be provided.
>>>>>>
>>>>>> Maybe we could add "--global" to that list. That would look, in order,
>>>>>> inside cgroups2 and cgroups, for a list of pids, and then run as if
>>>>>> --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified.
>>>>>
>>>>> I think actually it might be better just to make global the default when
>>>>> neither
>>>>> --pid nor --cgroup are provided? And in this case, I'll just grab all the pids
>>>>> from /proc rather than traverse the cgroup hierachy, that way it will work on
>>>>> systems without cgroups. Does that work for you?
>>>>
>>>> Yes! That was my initial idea, in fact, and after over-thinking it for
>>>> a while, it turned into the above. haha :)
>>>
>>> OK great - implemented for v3.
>>>
>>
>> Sweet!
>>
>>
>> thanks,
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux