On Jul 12 10:27, Yang Shi wrote: > On Tue, Jul 12, 2022 at 9:31 AM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote: > > > > On Jul 11 14:37, Yang Shi wrote: > > > On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote: > > > > > > > > Add PMDMappable field to smaps output which informs the user if memory > > > > in the VMA can be PMD-mapped by MADV_COLLAPSE. > > > > > > > > The distinction from THPeligible is needed for two reasons: > > > > > > > > 1) For THP, MADV_COLLAPSE is not coupled to THP sysfs controls, which > > > > THPeligible reports. > > > > > > > > 2) PMDMappable can also be used in HugeTLB fine-granularity mappings, > > > > which are independent from THP. > > > > > > Could you please elaborate the usecase? The user checks this hint > > > before calling MADV_COLLAPSE? Is it really necessary? > > > > > > And, TBH it sounds confusing and we don't have to maintain both > > > THPeligible and PMDMappable. We could just relax THPeligible to make > > > it return 1 even though THP is disabled by sysfs but MADV_COLLAPSE > > > could collapse it if such hint is useful. > > > > > > > Hey Yang, > > > > Thanks for taking the time to review this series again, and thanks for > > challenging this. > > > > TLDR: "Is it really necessary" - at the moment, no, probably not .. but I think > > it's "useful". > > > > Rationale: > > > > 1. IMO, I thought was was confusing seeing: > > > > ... > > AnonHugePages: 2048 kB > > ShmemPmdMapped: 0 kB > > FilePmdMapped: 0 kB > > Shared_Hugetlb: 0 kB > > Private_Hugetlb: 0 kB > > Swap: 0 kB > > SwapPss: 0 kB > > Locked: 0 kB > > THPeligible: 0 > > ... > > > > Maybe this could simply be clarified in the docs though. I guess we can already > > get: > > > > ... > > AnonHugePages: 0 kB > > ShmemPmdMapped: 0 kB > > FilePmdMapped: 2048 kB > > Shared_Hugetlb: 0 kB > > Private_Hugetlb: 0 kB > > Swap: 0 kB > > SwapPss: 0 kB > > Locked: 0 kB > > THPeligible: 0 > > ... > > > > today[1], so perhaps it's not a big deal. > > Not only that, if you have file PMD mapped then turn the THP sysfs > flag off, you get the same result. It is just a hint and just shows > the status at that moment when reading smaps. > Very good point. > > > > > > 2. It was useful for debugging - similar to rationale for including > > THPeligible1[2], the logic for determining if a VMA is eligible is pretty > > complicated. I.e. is this file mapped suitably? Unlike THPeligible, however, > > madvise(2) has the ability to set errno on failure to help* diagnose why some > > memory isn't being backed. > > I don't disagree it would help for debugging. But as a user who > doesn't know too much about kernel internals, when I see THPeligible > and PMDmappable, I would get confused TBH. And do we have to maintain > another similar hint? Maybe not. > > > > > 3. For the immediately-envisioned usecases, the user "knows" about what memory > > they are acting on. However, eventually we'd like to experiment with moving THP > > utilization policy to userspace. Here, it would be useful if the userspace agent > > managing was made aware of what memory it should be managing. I don't have a > > working prototype of what this would like yet, however. > > It is not a strong justification to add some user visible stuff for a > future feature (not even prototyped) since things may change, it > sounds safer to add such things once the usecase is solid TBH. > Ya, this was a weaker point for inclusion *now* TBH. > > > > 4. I thought it was neat that this field could be reused for HugeTLB > > fine-granularity mappings - but TBH I'm not sure how useful it'd be there. > > > > I figured relaxing existing THPeligible could break existing users / tests, and > > it'd be likewise confusing for them to see THPeligible: 1, but then have faults > > fail and they'd then have to go check sysfs settings and vma flags ; we'd be > > back in pre-commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each > > vma"). > > I'm not sure what applications rely on this hint, but if they are just > some test scripts, I think it should be fine. I don't think we > guarantee the test scripts won't get broken. AFAIK some test scripts > rely on the kernel dmesg text, for example, OOMs. And the meaning of > the fields do change, for example, inactive anon of /proc/meminfo, > which was changed by the patchset which put anon pages on inactive > list first instead of active list. We already noticed the abnormal > value from our monitoring tool when we adopted 5.10+ kernel. And > /proc/vmstat also had some fields renamed, for example, > workingset_refault of /proc/vmstat, it was split to > workseting_refault_anon and workingset_refault_file, so we had to > update our monitoring scripts accordingly. I think /proc/meminfo and > /proc/vmstat are more heavily used than smaps. > Thanks for the great context. My guess is, right now, THPelligible is more useful as-is than if we were to relax it to MADV_COLLAPSE eligibility. As such, I'm fine dropping this until a stronger and more immediate usecase presents itself. Thanks for checking my rationale here. Best, Zach > > > > Thanks, > > Zach > > > > [1] https://lore.kernel.org/linux-mm/YrxbQGiwml24APCx@xxxxxxxxxx/ > > > > > > > > > > > > > > > Signed-off-by: Zach O'Keefe <zokeefe@xxxxxxxxxx> > > > > --- > > > > Documentation/filesystems/proc.rst | 10 ++++++++-- > > > > fs/proc/task_mmu.c | 2 ++ > > > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > > > > index 47e95dbc820d..f207903a57a5 100644 > > > > --- a/Documentation/filesystems/proc.rst > > > > +++ b/Documentation/filesystems/proc.rst > > > > @@ -466,6 +466,7 @@ Memory Area, or VMA) there is a series of lines such as the following:: > > > > MMUPageSize: 4 kB > > > > Locked: 0 kB > > > > THPeligible: 0 > > > > + PMDMappable: 0 > > > > VmFlags: rd ex mr mw me dw > > > > > > > > The first of these lines shows the same information as is displayed for the > > > > @@ -518,9 +519,14 @@ replaced by copy-on-write) part of the underlying shmem object out on swap. > > > > does not take into account swapped out page of underlying shmem objects. > > > > "Locked" indicates whether the mapping is locked in memory or not. > > > > > > > > +"PMDMappable" indicates if the memory can be mapped by PMDs - 1 if true, 0 > > > > +otherwise. It just shows the current status. Note that this is memory > > > > +operable on explicitly by MADV_COLLAPSE. > > > > + > > > > "THPeligible" indicates whether the mapping is eligible for allocating THP > > > > -pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise. > > > > -It just shows the current status. > > > > +pages by the kernel, as well as the THP is PMD mappable or not - 1 if true, 0 > > > > +otherwise. It just shows the current status. Note this is memory the kernel can > > > > +transparently provide as THPs. > > > > > > > > "VmFlags" field deserves a separate description. This member represents the > > > > kernel flags associated with the particular virtual memory area in two letter > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > > index f8cd58846a28..29f2089456ba 100644 > > > > --- a/fs/proc/task_mmu.c > > > > +++ b/fs/proc/task_mmu.c > > > > @@ -867,6 +867,8 @@ static int show_smap(struct seq_file *m, void *v) > > > > > > > > seq_printf(m, "THPeligible: %d\n", > > > > hugepage_vma_check(vma, vma->vm_flags, true, false, true)); > > > > + seq_printf(m, "PMDMappable: %d\n", > > > > + hugepage_vma_check(vma, vma->vm_flags, true, false, false)); > > > > > > > > if (arch_pkeys_enabled()) > > > > seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); > > > > -- > > > > 2.37.0.rc0.161.g10f37bed90-goog > > > >