Re: [PATCH mm-unstable v2] mm/page_alloc: keep track of free highatomic

Yu Zhao <yuzhao@xxxxxxxxxx> · Mon, 28 Oct 2024 11:54:27 -0600

On Mon, Oct 28, 2024 at 5:01 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 10/28/24 1:24 AM, Yu Zhao wrote:
> > On Sun, Oct 27, 2024 at 3:05 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >>
> >> On 10/27/24 21:51, Yu Zhao wrote:
> >>> On Sun, Oct 27, 2024 at 2:36 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >>>>
> >>>> On 10/27/24 21:17, Yu Zhao wrote:
> >>>>> On Sun, Oct 27, 2024 at 1:53 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >>>>>>
> >>>>
> >>>> For example:
> >>>>
> >>>> - a page is on pcplist in MIGRATE_MOVABLE list
> >>>> - we reserve its pageblock as highatomic, which does nothing to the page on
> >>>> the pcplist
> >>>> - page above is flushed from pcplist to zone freelist, but it remembers it
> >>>> was MIGRATE_MOVABLE, merges with another buddy/buddies from the
> >>>> now-highatomic list, the resulting order-X page ends up on the movable
> >>>> freelist despite being in highatomic pageblock. The counter of free
> >>>> highatomic is now wrong wrt the freelist reality
> >>>
> >>> This is the part I don't follow: how is it wrong w.r.t. the freelist
> >>> reality? The new nr_free_highatomic should reflect how many pages are
> >>> exactly on free_list[MIGRATE_HIGHATOMIC], because it's updated
> >>> accordingly.
> >>
> >> You'd have to try implementing your change in the kernel without that
> >> migratetype hygiene series, and see how it would either not work, or you'd
> >> end up implementing the series as part of that.
> >
> > A proper backport would need to track the MT of the free_list a page
> > is deleted from, and I see no reason why in such a proper backport
> > "the counter could drift easily" or "the counter of free highatomic is
> > now wrong wrt the freelist reality". So I assume you actually mean
> > this patch can't be backported cleanly? (Which I do agree.)
>
> Yes you're right. But since we don't plan to backport it beyond 6.12,
> sorry for sidetracking the discussion unnecessarily. More importantly,
> is it possible to change the implementation as I suggested?

The only reason I didn't fold account_highatomic_freepages() into
account_freepages() is because the former must be called under the
zone lock, which is also how the latter is called but not as a
requirement.

I understand where you come from when suggesting a new per-cpu counter
for free highatomic. I have to disagree with that because 1) free
highatomic is relatively small and drifting might defeat its purpose;
2) per-cpu memory is among the top kernel memory overhead in our fleet
-- it really adds up. So I prefer not to use per-cpu counters unless
necessary.

So if it's ok with you, I'll just fold account_highatomic_freepages()
into account_freepages(), but keep the counter as per zone, not per
cpu.

> [1] Hooking
> to __del_page_from_free_list() and __add_to_free_list() means extra work
> in every loop iteration in expand() and __free_one_page(). The
> migratetype hygiene should ensure it's not necessary to intercept every
> freelist add/move and hooking to account_freepages() should be
> sufficient and in line with the intended design.

Agreed.