On Mon, Oct 28, 2024 at 5:01 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 10/28/24 1:24 AM, Yu Zhao wrote: > > On Sun, Oct 27, 2024 at 3:05 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > >> > >> On 10/27/24 21:51, Yu Zhao wrote: > >>> On Sun, Oct 27, 2024 at 2:36 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > >>>> > >>>> On 10/27/24 21:17, Yu Zhao wrote: > >>>>> On Sun, Oct 27, 2024 at 1:53 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > >>>>>> > >>>> > >>>> For example: > >>>> > >>>> - a page is on pcplist in MIGRATE_MOVABLE list > >>>> - we reserve its pageblock as highatomic, which does nothing to the page on > >>>> the pcplist > >>>> - page above is flushed from pcplist to zone freelist, but it remembers it > >>>> was MIGRATE_MOVABLE, merges with another buddy/buddies from the > >>>> now-highatomic list, the resulting order-X page ends up on the movable > >>>> freelist despite being in highatomic pageblock. The counter of free > >>>> highatomic is now wrong wrt the freelist reality > >>> > >>> This is the part I don't follow: how is it wrong w.r.t. the freelist > >>> reality? The new nr_free_highatomic should reflect how many pages are > >>> exactly on free_list[MIGRATE_HIGHATOMIC], because it's updated > >>> accordingly. > >> > >> You'd have to try implementing your change in the kernel without that > >> migratetype hygiene series, and see how it would either not work, or you'd > >> end up implementing the series as part of that. > > > > A proper backport would need to track the MT of the free_list a page > > is deleted from, and I see no reason why in such a proper backport > > "the counter could drift easily" or "the counter of free highatomic is > > now wrong wrt the freelist reality". So I assume you actually mean > > this patch can't be backported cleanly? (Which I do agree.) > > Yes you're right. But since we don't plan to backport it beyond 6.12, > sorry for sidetracking the discussion unnecessarily. More importantly, > is it possible to change the implementation as I suggested? The only reason I didn't fold account_highatomic_freepages() into account_freepages() is because the former must be called under the zone lock, which is also how the latter is called but not as a requirement. I understand where you come from when suggesting a new per-cpu counter for free highatomic. I have to disagree with that because 1) free highatomic is relatively small and drifting might defeat its purpose; 2) per-cpu memory is among the top kernel memory overhead in our fleet -- it really adds up. So I prefer not to use per-cpu counters unless necessary. So if it's ok with you, I'll just fold account_highatomic_freepages() into account_freepages(), but keep the counter as per zone, not per cpu. > [1] Hooking > to __del_page_from_free_list() and __add_to_free_list() means extra work > in every loop iteration in expand() and __free_one_page(). The > migratetype hygiene should ensure it's not necessary to intercept every > freelist add/move and hooking to account_freepages() should be > sufficient and in line with the intended design. Agreed.