On 1/11/24 03:54, Ryan Roberts wrote:
...
I'm not sure exectly what you are asking. The "cont" counters are counting
blocks of contiguous, naturally aligned physical memory, which are also mapped
contiguously and aligned. So a smaller --cont would always include all the
memory captured in a larger --cont. In this case, its all the *file-backed*
memory (as highighted in the label name) so nothing to do with (m)THP. But where
you have THP, --cont doesn't care what the underlying THP size is as long as its
requirements are met, so PMD-sized THPs would be included in e.g.
*anon*-cont-aligned-128kB.
Note the the "--cont" counters don't directly count memory that is PTE-mapped
with the contiguous bit set in the page table; it just counts memory that meets
the alignment, size and mapping requirements. On arm64 systems with the contpte
series, the contiguous bit would be used here, but its not a part of what's
getting measured.
The "cont" and "naturally aligned" terms are difficult here, even though
I'm familiar with the implementation. But putting on my systems
monitoring hat, these terms are not helping people as much as I'd like,
because:
a) "Contiguous" is not really a unique situation, so measuring large pages
that are "contiguous" is confusing. All folios are contiguous, and
anything a pte points to is contiguous as well. So --cont really
throws off the user/reader.
b) "Naturally aligned" is also tricky. Because "natural" is not explained.
Here it means NAPOT (naturally aligned power of two, I saw that in the
riscv docs).
After spending a day or two exploring running systems with this, I'd
like to suggest:
1) measure "native PMD THPs" vs. pte-mapped mTHPs. This provides a lot
of information: mTHP is configured as expected, and is helping or not,
etc.
2) Not having to list out all the mTHP sizes would be nice. Instead,
just use the possible sizes from /sys/kernel/mm/transparent_hugepage/* ,
unless the user specifies sizes.
...
(e.g. /sys/fs/cgroup for cgroup-v2 or
/sys/fs/cgroup/pids for cgroup-v1). Exactly one
of --pid and --cgroup must be provided.
Maybe we could add "--global" to that list. That would look, in order,
inside cgroups2 and cgroups, for a list of pids, and then run as if
--cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified.
I think actually it might be better just to make global the default when neither
--pid nor --cgroup are provided? And in this case, I'll just grab all the pids
from /proc rather than traverse the cgroup hierachy, that way it will work on
systems without cgroups. Does that work for you?
Yes! That was my initial idea, in fact, and after over-thinking it for
a while, it turned into the above. haha :)
thanks,
--
John Hubbard
NVIDIA