On 01/07/2024 11:50, Lance Yang wrote: > On Mon, Jul 1, 2024 at 4:31 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >> >> On 28/06/2024 14:07, Lance Yang wrote: >>> This commit introduces documentation for mTHP split counters in >>> transhuge.rst. >>> >>> Signed-off-by: Mingzhe Yang <mingzhe.yang@xxxxxx> >>> Signed-off-by: Lance Yang <ioworker0@xxxxxxxxx> >>> --- >>> Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++ >>> 1 file changed, 16 insertions(+) >>> >>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst >>> index 1f72b00af5d3..709fe10b60f4 100644 >>> --- a/Documentation/admin-guide/mm/transhuge.rst >>> +++ b/Documentation/admin-guide/mm/transhuge.rst >>> @@ -514,6 +514,22 @@ file_fallback_charge >>> falls back to using small pages even though the allocation was >>> successful. >> >> >> I note at the top of this section there is a note: >> >> Monitoring usage >> ================ >> >> .. note:: >> Currently the below counters only record events relating to >> PMD-sized THP. Events relating to other THP sizes are not included. >> >> Which is out of date, now that we support mTHP stats. Perhaps it should be removed? > > Good catch! Let's remove that in this patch ;) > >> >>> >>> +split >>> + is incremented every time a huge page is successfully split into >>> + base pages. This can happen for a variety of reasons but a common >>> + reason is that a huge page is old and is being reclaimed. >>> + This action implies splitting any block mappings into PTEs. >> >> Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a >> potential aid so solving the swap-out fragmentation problem is to split high >> orders to lower (but not 0) orders. I don't know if we would take that route, >> but in principle it sounds like splitting mTHP to smaller mTHP might be >> something we want some day. I wonder if we should spec this counter to also >> include splits to smaller orders and not just splits to base pages? >> >> Actually looking at the code, I think split_huge_page_to_list_to_order(order>0) >> would already increment this counter without actually splitting to base pages. >> So the documantation should probably just reflect that. > > Yep, you're right. > > It’s important that the documentation reflects that to ensure consistency. > > How about "... is successfully split into smaller orders. This can..."? fine by me. > > Thanks, > Lance > >> >>> + >>> +split_failed >>> + is incremented if kernel fails to split huge >>> + page. This can happen if the page was pinned by somebody. >>> + >>> +split_deferred >>> + is incremented when a huge page is put onto split >>> + queue. This happens when a huge page is partially unmapped and >>> + splitting it would free up some memory. Pages on split queue are >>> + going to be split under memory pressure. >>> + >>> As the system ages, allocating huge pages may be expensive as the >>> system uses memory compaction to copy data around memory to free a >>> huge page for use. There are some counters in ``/proc/vmstat`` to help >>