Re: [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jun 29 22:32, Peter Xu wrote:
> On Wed, Jun 29, 2022 at 06:42:25PM -0700, Zach O'Keefe wrote:
> > On Jun 29 19:21, Peter Xu wrote:
> > > On Fri, Jun 03, 2022 at 05:39:57PM -0700, Zach O'Keefe wrote:
> > > > Add enforce_thp_enabled flag to struct collapse_control that allows context
> > > > to ignore constraints imposed by /sys/kernel/transparent_hugepage/enabled.
> > > >
> > > > This flag is set in khugepaged collapse context to preserve existing
> > > > khugepaged behavior.
> > > >
> > > > This flag will be used (unset) when introducing madvise collapse
> > > > context since the desired THP semantics of MADV_COLLAPSE aren't coupled
> > > > to sysfs THP settings.  Most notably, for the purpose of eventual
> > > > madvise_collapse(2) support, this allows userspace to trigger THP collapse
> > > > on behalf of another processes, without adding support to meddle with
> > > > the VMA flags of said process, or change sysfs THP settings.
> > > >
> > > > For now, limit this flag to /sys/kernel/transparent_hugepage/enabled,
> > > > but it can be expanded to include
> > > > /sys/kernel/transparent_hugepage/shmem_enabled later.
> > > >
> > > > Link: https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@xxxxxxxxxxxxxx/
> > > >
> > > > Signed-off-by: Zach O'Keefe <zokeefe@xxxxxxxxxx>
> > > > ---
> > > >  mm/khugepaged.c | 34 +++++++++++++++++++++++++++-------
> > > >  1 file changed, 27 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index c3589b3e238d..4ad04f552347 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -94,6 +94,11 @@ struct collapse_control {
> > > >      */
> > > >     bool enforce_page_heuristics;
> > > >
> > > > +   /* Enforce constraints of
> > > > +    * /sys/kernel/mm/transparent_hugepage/enabled
> > > > +    */
> > > > +   bool enforce_thp_enabled;
> > >
> > > Small nitpick that we could have merged the two booleans if they always
> > > match, but no strong opinions if you think these two are clearer.  Or maybe
> > > there's other plan of using them?
> > >
> > > > +
> > > >     /* Num pages scanned per node */
> > > >     int node_load[MAX_NUMNODES];
> > > >
> > > > @@ -893,10 +898,12 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node)
> > > >   */
> > > >
> > > >  static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > -           struct vm_area_struct **vmap)
> > > > +                              struct vm_area_struct **vmap,
> > > > +                              struct collapse_control *cc)
> > > >  {
> > > >     struct vm_area_struct *vma;
> > > >     unsigned long hstart, hend;
> > > > +   unsigned long vma_flags;
> > > >
> > > >     if (unlikely(khugepaged_test_exit(mm)))
> > > >             return SCAN_ANY_PROCESS;
> > > > @@ -909,7 +916,18 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > >     hend = vma->vm_end & HPAGE_PMD_MASK;
> > > >     if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > > >             return SCAN_ADDRESS_RANGE;
> > > > -   if (!hugepage_vma_check(vma, vma->vm_flags))
> > > > +
> > > > +   /*
> > > > +    * If !cc->enforce_thp_enabled, set VM_HUGEPAGE so that
> > > > +    * hugepage_vma_check() can pass even if
> > > > +    * TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG is set (i.e. "madvise" mode).
> > > > +    * Note that hugepage_vma_check() doesn't enforce that
> > > > +    * TRANSPARENT_HUGEPAGE_FLAG or TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG
> > > > +    * must be set (i.e. "never" mode).
> > > > +    */
> > > > +   vma_flags = cc->enforce_thp_enabled ?  vma->vm_flags
> > > > +                   : vma->vm_flags | VM_HUGEPAGE;
> > >
> > > Another nitpick..
> > >
> > > We could get a weird vm_flags when VM_NOHUGEPAGE is set.  I don't think
> > > it'll go wrong since hugepage_vma_check() checks NOHUGEPAGE first, but IMHO
> > > we shouldn't rely on that as it seems error prone (e.g. when accidentally
> > > moved things around).
> > >
> > > So maybe nicer to only apply VM_HUGEPAGE if !VM_NOHUGEPAGE?  Or pass over
> > > "enforce_thp_enabled" into hugepage_vma_check() should work too, iiuc.
> > > Passing in the boolean has one benefit that we don't really need the
> > > complicated comment above since the code should be able to explain itself.
> > 
> > Hey Peter, thanks again for taking the time to review.
> > 
> > Answering both of the above at the time:
> > 
> > As in this series so far, I've tried to keep context functionally-declarative -
> > specifying the intended behavior (e.g. "enforce_page_heuristics") rather than
> > adding "if (khugepaged) { .. } else if (madv_collapse) { .. } else if { .. }"
> > around the code which, IMO, makes it difficult to follow. Unfortunately, I've
> > ran into the 2 problems you've stated here:
> > 
> > 1) *Right now* all the behavior knobs are either off/on at the same time
> > 2) For hugepage_vma_check() (now in mm/huge_memory.c and acting as the central
> >    authority on THP eligibility), things are complicated enough that I
> >    couldn't find a clean way to describe the parameters of the context without
> >    explicitly mentioning the caller.
> > 
> > For (2), instead of adding another arg to specify MADV_COLLAPSE's behavior,
> > I think we need to package these contexts into a single set of flags:
> > 
> > enum thp_ctx_flags {
> >         THP_CTX_ANON_FAULT              = 1 << 1,
> >         THP_CTX_KHUGEPAGED              = 1 << 2,
> >         THP_CTX_SMAPS                   = 1 << 3,
> >         THP_CTX_MADVISE_COLLAPSE        = 1 << 4,
> > };
> > 
> > That will avoid hacking vma flags passed to hugepage_vma_check().
> > 
> > And, if we have these anyways, I might as well do away with some of the
> > (semantically meaningful but functionally redundant) flags in
> > struct collapse_control and just specify a single .thp_ctx_flags member. I'm
> > not entirely happy with it - but that's what I'm planning.
> > 
> > WDYT?
> 
> Firstly I think I wrongly sent previous email privately.. :( Let me try to
> add the list back..
> 
> IMHO we don't need to worry too much on the "if... else if... else",
> because they shouldn't be more complicated than when you spread the
> meanings into multiple flags, or how could it be? :) IMHO it should
> literally be as simple as applying:
> 
>   s/enforce_{A|B|C|...}/khugepaged_initiated/g
> 
> Throughout the patches, then we squash the patches introducing enforce_X.

Right, the code today will be virtually identical. The attempt to describe
contexts in terms of behaviors is based on an unfounded assumption that
successive contexts could reuse said behaviors - but there are currently no
plans for other collapsing contexts.

> If you worry it's not clear on "what does khugepaged_initiated mean", we
> could add whatever comment above the variable explaining A/B/C/D will be
> covered when this is set, and we could postpone to do the flag split only
> until there're real user.
> 
> Adding these flags could add unnecessary bit-and instructions into the code
> generated at last, and if it's only about readability issue that's really
> what comment is for?

Ya maybe I'm overthinking it and will just do the most straightforward thing for
now.

As always, thanks for taking the time to review / offer suggestions!

Best,
Zach

> Thanks,
> 
> -- 
> Peter Xu
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux