Re: [PATCH v3 03/12] mm/khugepaged: make hugepage allocation context-specific

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 28, 2022 at 7:51 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Wed, Apr 27, 2022 at 05:51:53PM -0700, Zach O'Keefe wrote:
> > > > @@ -1110,13 +1115,14 @@ static void collapse_huge_page(struct mm_struct *mm,
> > > >        */
> > > >       mmap_read_unlock(mm);
> > > >
> > > > +     node = khugepaged_find_target_node(cc);
> > > >       /* sched to specified node before huage page memory copy */
> > > >       if (task_node(current) != node) {
> > > >               cpumask = cpumask_of_node(node);
> > > >               if (!cpumask_empty(cpumask))
> > > >                       set_cpus_allowed_ptr(current, cpumask);
> > > >       }
> > > > -     new_page = khugepaged_alloc_page(hpage, gfp, node);
> > > > +     new_page = cc->alloc_hpage(cc, gfp, node);
> > >
> > > AFAICT you removed all references of khugepaged_alloc_page() in this patch,
> > > then you'd better drop the function for both NUMA and UMA.
> > >
> >
> > Sorry, I'm not sure I follow. In khugepaged context, logic WRT
> > khugepaged_alloc_page() is unchanged - it's just called indirectly
> > through ->alloc_hpage().
>
> Ah you're right, sorry for the confusion.
>
> >
> > > Said that, I think it's actually better to keep them, as they do things
> > > useful.  For example,AFAICT this ->alloc_hpage() change can leak the hpage
> > > alloacted for UMA already so that's why I think keeping them makes sense,
> > > then iiuc the BUG_ON would trigger with UMA already.
> > >
> > > I saw that you've moved khugepaged_find_target_node() into this function
> > > which looks definitely good, but if we could keep khugepaged_alloc_page()
> > > and then IMHO we could even move khugepaged_find_target_node() into that
> > > and drop the "node" parameter in khugepaged_alloc_page().
> > >
> >
> > I actually had done this, until commit 4272063db38c ("mm/khugepaged:
> > sched to numa node when collapse huge page") which forced me to keep
> > "node" visible in this function.
>
> Right, I was looking at my local tree and that patch seems to be very
> lately added into -next.
>
> I'm curious why it wasn't applying to file thps too if it is worthwhile,
> since if so that's also a suitable candidate to be moved into the same
> hook.  I've asked in that thread instead.
>
> Before that, feel free to keep your code as is.
>

Thanks for checking on that. On second thought, it seems like we would
actually want to move this sched logic into the khupaged hook impl
since we probably don't want to be moving around the calling process
if in madvise context.

> >
> > > I'd even consider moving cc->gfp() all into it if possible, since the gfp
> > > seems to be always with __GFP_THISNODE anyways.
> > >
> >
> > I would have liked to do this, but the gfp flags are referenced again
> > when calling mem_cgroup_charge(), so I couldn't quite get rid of them
> > from here.
>
> Yeah, maybe we could move mem_cgroup_charge() into the hook too?  As below
> codes are duplicated between file & anon and IMHO they're good candidate to
> a new helper already anyway:
>
>         /* Only allocate from the target node */
>         gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
>
>         new_page = khugepaged_alloc_page(hpage, gfp, node);
>         if (!new_page) {
>                 result = SCAN_ALLOC_HUGE_PAGE_FAIL;
>                 goto out;
>         }
>
>         if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) {
>                 result = SCAN_CGROUP_CHARGE_FAIL;
>                 goto out;
>         }
>         count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
>
> If we want to generalize it maybe we want to return the "result" instead of
> the new page though, so the newpage can be passed in as a pointer.
>
> There's one mmap_read_unlock(mm) for the anonymous code but I think that
> can simply be moved above the chunk.
>
> No strong opinion here, please feel free to think about what's the best
> approach for landing this series.
>

Thanks for the suggestion. I'll play around with it a little and see
what looks good.

> [...]
>
> >
> > I could (?) make .gfp of type gfp_t and just update it on every
> > khugepaged scan (in case it changed) and also remove the gfp parameter
> > for ->alloc_hpage().
>
> If that's the case I'd prefer you keep you code as-is; gfp() is perfectly
> fine and gfp() is light, I'm afraid that caching thing could make things
> complicated.
>

Ack.

Thanks again for your time and thoughts, Peter!

Best,
Zach

> Thanks,
>
> --
> Peter Xu
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux