The new transparent_hugepage=defer option allows for a more conservative approach to THPs. Document its usage in the transhuge admin-guide. Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Barry Song <baohua@xxxxxxxxxx> Cc: Ryan Roberts <ryan.roberts@xxxxxxx> Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> Cc: Lance Yang <ioworker0@xxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: Zi Yan <ziy@xxxxxxxxxx> Cc: Rafael Aquini <aquini@xxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Jonathan Corbet <corbet@xxxxxxx> Signed-off-by: Nico Pache <npache@xxxxxxxxxx> --- Documentation/admin-guide/mm/transhuge.rst | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 058485daf186..1946fbb789b2 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -88,8 +88,9 @@ In certain cases when hugepages are enabled system wide, application may end up allocating more memory resources. An application may mmap a large region but only touch 1 byte of it, in that case a 2M page might be allocated instead of a 4k page for no good. This is why it's -possible to disable hugepages system-wide and to only have them inside -MADV_HUGEPAGE madvise regions. +possible to disable hugepages system-wide, only have them inside +MADV_HUGEPAGE madvise regions, or defer them away from the page fault +handler to khugepaged. Embedded systems should enable hugepages only inside madvise regions to eliminate any risk of wasting any precious byte of memory and to @@ -99,6 +100,15 @@ Applications that gets a lot of benefit from hugepages and that don't risk to lose memory by using hugepages, should use madvise(MADV_HUGEPAGE) on their critical mmapped regions. +Applications that would like to benefit from THPs but would still like a +more memory conservative approach can choose 'defer'. This avoids +inserting THPs at the page fault handler unless they are MADV_HUGEPAGE. +Khugepaged will then scan the mappings for potential collapses into PMD +sized pages. Admins using this the 'defer' setting should consider +tweaking khugepaged/max_ptes_none. The current default of 511 may +aggressively collapse your PTEs into PMDs. Lower this value to conserve +more memory (ie. max_ptes_none=64). + .. _thp_sysfs: sysfs @@ -136,6 +146,7 @@ The top-level setting (for use with "inherit") can be set by issuing one of the following commands:: echo always >/sys/kernel/mm/transparent_hugepage/enabled + echo defer >/sys/kernel/mm/transparent_hugepage/enabled echo madvise >/sys/kernel/mm/transparent_hugepage/enabled echo never >/sys/kernel/mm/transparent_hugepage/enabled @@ -264,7 +275,8 @@ of small pages into one large page:: A higher value leads to use additional memory for programs. A lower value leads to gain less thp performance. Value of max_ptes_none can waste cpu time very little, you can -ignore it. +ignore it. Consider lowering this value when using +``transparent_hugepage=defer`` ``max_ptes_swap`` specifies how many pages can be brought in from swap when collapsing a group of pages into a transparent huge page:: -- 2.45.2