Re: [PATCH RFC] mm: warn potential return NULL for kmalloc_array and kvmalloc_array with __GFP_NOFAIL

Vlastimil Babka <vbabka@xxxxxxx> · Fri, 19 Jul 2024 10:50:07 +0200

On 7/19/24 10:01 AM, Michal Hocko wrote:
> On Fri 19-07-24 19:51:06, Barry Song wrote:
>> On Fri, Jul 19, 2024 at 7:42 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> [...]
>> > It cannot reclaim itself and it cannot sleep to wait for the memory so
>> > NOFAIL semantic is simply impossible. We have put a warning in place to
>> 
>> this is still "right" behaviour to retry infinitely at least according
>> to the doc of
>> __GFP_NOFAIL.
> 
> I do not agree that implementing busy loop in the kernel is the right
> practice!

I think there are multiple aspects to this. One is what's the right
practice, another what's the most graceful recovery from a buggy code.

We can all agree that __GFP_NOFAIL together with GFP_NOWAIT or other
variants that can't sleep is wrong and unsupported. Hence we should put
sufficient warnings in place to prevent such combinations from being used in
the first place.

But what if some callsite makes it into mainline without anyone seeing the
warnings in the process and fixing them, because it's rare to hit? And then
an attacker finds a way to hit it?

We could fail the allocation which would probably crash afterwards,
hopefully non-exploitably. We could BUG_ON() which would crash reliably, but
still become a DoS vulnerability. Or we could busy loop, which might be also
a DoS but maybe not so reliable because kswapd or other direct reclaiming
processes could recover the situation (and the user could report the warning
that was produced in the process).

That wouldn't mean the busy loop is a correct and supported practice. It
would just mean it's the least bad of the bad options we have to deal with
an allocation that's wrong but we didn't catch soon enough in the development.

Compared to the original problem at hand, kmalloc_array() of impossible size
with __GFP_NOFAIL is not recoverable by busy looping so as I've already said
I like most the idea of BUG_ON(). Yes it's a DoS vector if someone finds
such a user triggerable allocation, but I don't see a better option.

Same thing should probably happen with a __GFP_NOFAIL page allocation that
requests >MAX_ORDER.

>> I assume getting new memory by many retries is still
>> possibly some other processes might be reclaiming or freeing memory
>> then providing free memory to this one being stuck.
> 
> No, I strongly disagree we should even pretend this is a supported
> allocation strategy. NAK to any attempt to legalize it in some form.
>