Re: [PATCH rfc 0/3] mm: allow more high-order pages stored on PCP lists

Barry Song <21cnbao@xxxxxxxxx> · Mon, 15 Apr 2024 19:14:01 +0800

On Mon, Apr 15, 2024 at 6:52 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 15.04.24 10:59, Kefeng Wang wrote:
> >
> >
> > On 2024/4/15 16:18, Barry Song wrote:
> >> On Mon, Apr 15, 2024 at 8:12 PM Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
> >>>
> >>> Both the file pages and anonymous pages support large folio, high-order
> >>> pages except PMD_ORDER will also be allocated frequently which could
> >>> increase the zone lock contention, allow high-order pages on pcp lists
> >>> could reduce the big zone lock contention, but as commit 44042b449872
> >>> ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
> >>> pointed, it may not win in all the scenes, add a new control sysfs to
> >>> enable or disable specified high-order pages stored on PCP lists, the order
> >>> (PAGE_ALLOC_COSTLY_ORDER, PMD_ORDER) won't be stored on PCP list by default.
> >>
> >> This is precisely something Baolin and I have discussed and intended
> >> to implement[1],
> >> but unfortunately, we haven't had the time to do so.
> >
> > Indeed, same thing. Recently, we are working on unixbench/lmbench
> > optimization, I tested Multi-size THP for anonymous memory by hard-cord
> > PAGE_ALLOC_COSTLY_ORDER from 3 to 4[1], it shows some improvement but
> > not for all cases and not very stable, so re-implemented it by according
> > to the user requirement and enable it dynamically.
>
> I'm wondering, though, if this is really a suitable candidate for a
> sysctl toggle. Can anybody really come up with an educated guess for
> these values?
>
> Especially reading "Benchmarks Score shows a little improvoment(0.28%)"
> and "it may not win in all the scenes", to me it mostly sounds like
> "minimal impact" -- so who cares?

Considering the original goal of employing PCP to alleviate page allocation
lock contention, and now that we have configured mTHP, for instance, to
64KiB, it's possible that 64KiB could become the most common page allocation
size just like order0. We should expect to see similar improvements as a result.

I'm questioning whether shell8 is the suitable benchmark for this
situation. A mere
0.28% performance enhancement might not be substantial to pique interest.
Shouldn't we have numerous threads allocating and freeing in parallel to truly
gauge the benefits of PCP?

>
> How much is the cost vs. benefit of just having one sane system
> configuration?
>
> --
> Cheers,
>
> David / dhildenb
>