Re: [PATCH] mm, percpu: do not consider sleepable allocations atomic

Vlastimil Babka <vbabka@xxxxxxx> · Fri, 21 Feb 2025 10:48:28 +0100

On 2/21/25 03:36, Dennis Zhou wrote:
> I've thought about this in the back of my head for the past few weeks. I
> think I have 2 questions about this change.
> 
> 1. Back to what TJ said earlier about probing. I feel like GFP_KERNEL
>    allocations should be okay because that more or less is control plane
>    time? I'm not sure dropping PR_SET_IO_FLUSHER is all that big of a
>    work around?

This solves the iscsid case but not other cases, where GFP_KERNEL
allocations are fundamentally impossible.

> 2. This change breaks the feedback loop as we discussed above.
>    Historically we've targeted 2-4 free pages worth of percpu memory.
>    This is done by kicking the percpu work off. That does GFP_KERNEL
>    allocations and if that requires reclaim then it goes and does it.
>    However, now we're saying kswapd is going to work in parallel while
>    we try to get pages in the worker thread.
> 
>    Given you're more versed in the reclaim side. I presume it must be
>    pretty bad if we're failing to get order-0 pages even if we have
>    NOFS/NOIO set?

IMHO yes, so I don't think we need to pre-emptively fear that situation that
much. OTOH in the current state, depleting pcpu's atomic reserves and
failing pcpu_alloc due to not being allowed to take the mutex can happen
easily and even if there's plenty of free memory.

>    My feeling is that we should add back some knowledge of the
>    dependency so if the worker fails to get pages, it doesn't reschedule
>    immediately. Maybe it's as simple as adding a sleep in the worker or
>    playing with delayed work...

I think if we wanted things to be more robust (and perhaps there's no need
to, see above), the best way would be to make the worker preallocate with
GFP_KERNEL outside of pcpu_alloc_mutex. I assume it's probably not easy to
implement as page table allocations are involved in the process and we don't
have a way to supply preallocated memory for those.

> Thanks,
> Dennis