Re: [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 29, 2022 at 06:42:25PM -0700, Zach O'Keefe wrote:
> On Jun 29 19:21, Peter Xu wrote:
> > On Fri, Jun 03, 2022 at 05:39:57PM -0700, Zach O'Keefe wrote:
> > > Add enforce_thp_enabled flag to struct collapse_control that allows context
> > > to ignore constraints imposed by /sys/kernel/transparent_hugepage/enabled.
> > >
> > > This flag is set in khugepaged collapse context to preserve existing
> > > khugepaged behavior.
> > >
> > > This flag will be used (unset) when introducing madvise collapse
> > > context since the desired THP semantics of MADV_COLLAPSE aren't coupled
> > > to sysfs THP settings.  Most notably, for the purpose of eventual
> > > madvise_collapse(2) support, this allows userspace to trigger THP collapse
> > > on behalf of another processes, without adding support to meddle with
> > > the VMA flags of said process, or change sysfs THP settings.
> > >
> > > For now, limit this flag to /sys/kernel/transparent_hugepage/enabled,
> > > but it can be expanded to include
> > > /sys/kernel/transparent_hugepage/shmem_enabled later.
> > >
> > > Link: https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@xxxxxxxxxxxxxx/
> > >
> > > Signed-off-by: Zach O'Keefe <zokeefe@xxxxxxxxxx>
> > > ---
> > >  mm/khugepaged.c | 34 +++++++++++++++++++++++++++-------
> > >  1 file changed, 27 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index c3589b3e238d..4ad04f552347 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -94,6 +94,11 @@ struct collapse_control {
> > >      */
> > >     bool enforce_page_heuristics;
> > >
> > > +   /* Enforce constraints of
> > > +    * /sys/kernel/mm/transparent_hugepage/enabled
> > > +    */
> > > +   bool enforce_thp_enabled;
> >
> > Small nitpick that we could have merged the two booleans if they always
> > match, but no strong opinions if you think these two are clearer.  Or maybe
> > there's other plan of using them?
> >
> > > +
> > >     /* Num pages scanned per node */
> > >     int node_load[MAX_NUMNODES];
> > >
> > > @@ -893,10 +898,12 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node)
> > >   */
> > >
> > >  static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > -           struct vm_area_struct **vmap)
> > > +                              struct vm_area_struct **vmap,
> > > +                              struct collapse_control *cc)
> > >  {
> > >     struct vm_area_struct *vma;
> > >     unsigned long hstart, hend;
> > > +   unsigned long vma_flags;
> > >
> > >     if (unlikely(khugepaged_test_exit(mm)))
> > >             return SCAN_ANY_PROCESS;
> > > @@ -909,7 +916,18 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > >     hend = vma->vm_end & HPAGE_PMD_MASK;
> > >     if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > >             return SCAN_ADDRESS_RANGE;
> > > -   if (!hugepage_vma_check(vma, vma->vm_flags))
> > > +
> > > +   /*
> > > +    * If !cc->enforce_thp_enabled, set VM_HUGEPAGE so that
> > > +    * hugepage_vma_check() can pass even if
> > > +    * TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG is set (i.e. "madvise" mode).
> > > +    * Note that hugepage_vma_check() doesn't enforce that
> > > +    * TRANSPARENT_HUGEPAGE_FLAG or TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG
> > > +    * must be set (i.e. "never" mode).
> > > +    */
> > > +   vma_flags = cc->enforce_thp_enabled ?  vma->vm_flags
> > > +                   : vma->vm_flags | VM_HUGEPAGE;
> >
> > Another nitpick..
> >
> > We could get a weird vm_flags when VM_NOHUGEPAGE is set.  I don't think
> > it'll go wrong since hugepage_vma_check() checks NOHUGEPAGE first, but IMHO
> > we shouldn't rely on that as it seems error prone (e.g. when accidentally
> > moved things around).
> >
> > So maybe nicer to only apply VM_HUGEPAGE if !VM_NOHUGEPAGE?  Or pass over
> > "enforce_thp_enabled" into hugepage_vma_check() should work too, iiuc.
> > Passing in the boolean has one benefit that we don't really need the
> > complicated comment above since the code should be able to explain itself.
> 
> Hey Peter, thanks again for taking the time to review.
> 
> Answering both of the above at the time:
> 
> As in this series so far, I've tried to keep context functionally-declarative -
> specifying the intended behavior (e.g. "enforce_page_heuristics") rather than
> adding "if (khugepaged) { .. } else if (madv_collapse) { .. } else if { .. }"
> around the code which, IMO, makes it difficult to follow. Unfortunately, I've
> ran into the 2 problems you've stated here:
> 
> 1) *Right now* all the behavior knobs are either off/on at the same time
> 2) For hugepage_vma_check() (now in mm/huge_memory.c and acting as the central
>    authority on THP eligibility), things are complicated enough that I
>    couldn't find a clean way to describe the parameters of the context without
>    explicitly mentioning the caller.
> 
> For (2), instead of adding another arg to specify MADV_COLLAPSE's behavior,
> I think we need to package these contexts into a single set of flags:
> 
> enum thp_ctx_flags {
>         THP_CTX_ANON_FAULT              = 1 << 1,
>         THP_CTX_KHUGEPAGED              = 1 << 2,
>         THP_CTX_SMAPS                   = 1 << 3,
>         THP_CTX_MADVISE_COLLAPSE        = 1 << 4,
> };
> 
> That will avoid hacking vma flags passed to hugepage_vma_check().
> 
> And, if we have these anyways, I might as well do away with some of the
> (semantically meaningful but functionally redundant) flags in
> struct collapse_control and just specify a single .thp_ctx_flags member. I'm
> not entirely happy with it - but that's what I'm planning.
> 
> WDYT?

Firstly I think I wrongly sent previous email privately.. :( Let me try to
add the list back..

IMHO we don't need to worry too much on the "if... else if... else",
because they shouldn't be more complicated than when you spread the
meanings into multiple flags, or how could it be? :) IMHO it should
literally be as simple as applying:

  s/enforce_{A|B|C|...}/khugepaged_initiated/g

Throughout the patches, then we squash the patches introducing enforce_X.

If you worry it's not clear on "what does khugepaged_initiated mean", we
could add whatever comment above the variable explaining A/B/C/D will be
covered when this is set, and we could postpone to do the flag split only
until there're real user.

Adding these flags could add unnecessary bit-and instructions into the code
generated at last, and if it's only about readability issue that's really
what comment is for?

Thanks,

-- 
Peter Xu





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux