Re: [PATCH RFC] mm: warn potential return NULL for kmalloc_array and kvmalloc_array with __GFP_NOFAIL

Vlastimil Babka <vbabka@xxxxxxx> · Thu, 18 Jul 2024 10:16:29 +0200

On 7/18/24 9:12 AM, Michal Hocko wrote:
> On Thu 18-07-24 00:04:54, Christoph Hellwig wrote:
>> On Thu, Jul 18, 2024 at 08:58:44AM +0200, Michal Hocko wrote:
>> > WARN_ON is effectively BUG_ON with panic_on_warn so if this happens to
>> > be in a user triggerable path then you would have an easy way to panic
>> > the whole machine. It is likely true that the kernel could oops just
>> > right after the failure but that could be recoverable at least.
>> 
>> If you set panic_on_warn you are either debugging in which case it's
>> the right thing, or you are fucked.  So don't do it unless you
>> expet frequent crashes.
> 
> I do agree and I wouldn't recommend running panic_on_warn on anything
> even touching production setups. Reality check disagrees.
> 
>> > If anything I would just pr_warn with caller address or add dump_stack
>> > to capture the full trace. That would give us the caller that needs
>> > fixing without panicing the system with panic_on_warn.
>> 
>> The whole freaking point of __GFP_NOFAIL is that callers don't handle
>> allocation failures.  So in fact a straight BUG is the right thing
>> here.

Agreed. It's just not a recoverable situation (WARN_ON is for recoverable
situations). The caller cannot handle allocation failure and at the same
time asked for an impossible allocation. BUG_ON() is a guaranteed oops with
stracktrace etc. We don't need to hope for the later NULL pointer
dereference (which might if really unlucky happen from a different context
where it's no longer obvious what lead to the allocation failing).

> OOPs from the NULL ptr could be just a safer option bacause it could be
> recoverable.

Wonder if people who enable panic_on_warn have any reason to not also enable
panic_on_oops, other than forgetting/not realizing it also exists?

> Anyway, I am questioning that WARN/BUG/pr_warn on overflow check is
> adding any actual value because GFP_NOFAIL large allocations could be
> even more dangerous essentially rendering the system completely
> unsuabale.
>