Re: [PATCH mm-unstable v2] mm/page_alloc: keep track of free highatomic

Vlastimil Babka <vbabka@xxxxxxx> · Mon, 28 Oct 2024 19:29:45 +0100

On 10/28/24 18:54, Yu Zhao wrote:
> On Mon, Oct 28, 2024 at 5:01 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>>
>> Yes you're right. But since we don't plan to backport it beyond 6.12,
>> sorry for sidetracking the discussion unnecessarily. More importantly,
>> is it possible to change the implementation as I suggested?
> 
> The only reason I didn't fold account_highatomic_freepages() into
> account_freepages() is because the former must be called under the
> zone lock, which is also how the latter is called but not as a
> requirement.

Ah, I guess we can document the requirement/add an lockdep assert. Using
__mod_zone_page_state() already implies some context restrictions although
not zone lock specifically.

> I understand where you come from when suggesting a new per-cpu counter
> for free highatomic. I have to disagree with that because 1) free
> highatomic is relatively small and drifting might defeat its purpose;
> 2) per-cpu memory is among the top kernel memory overhead in our fleet
> -- it really adds up. So I prefer not to use per-cpu counters unless
> necessary.

OK, didn't think of these drawbacks.

> So if it's ok with you, I'll just fold account_highatomic_freepages()
> into account_freepages(), but keep the counter as per zone, not per
> cpu.

OK, thanks!

>> [1] Hooking
>> to __del_page_from_free_list() and __add_to_free_list() means extra work
>> in every loop iteration in expand() and __free_one_page(). The
>> migratetype hygiene should ensure it's not necessary to intercept every
>> freelist add/move and hooking to account_freepages() should be
>> sufficient and in line with the intended design.
> 
> Agreed.