Re: [PATCH RFC] mm: warn potential return NULL for kmalloc_array and kvmalloc_array with __GFP_NOFAIL

Vlastimil Babka <vbabka@xxxxxxx> · Fri, 19 Jul 2024 10:35:52 +0200

On 7/19/24 2:35 AM, Barry Song wrote:
> On Thu, Jul 18, 2024 at 8:50 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>>
>> On Thu 18-07-24 20:43:53, Barry Song wrote:
>> > On Thu, Jul 18, 2024 at 8:32 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>> > >
>> > > On Thu 18-07-24 20:18:02, Barry Song wrote:
>> > > > So the purpose is making sure the semantics - NOFAIL means no failure
>> > > > and we don't need to check ret.  If we can't really succeed, it should throw
>> > > > a BUG to stop any potential exploits.
>> > >
>> > > This would require to panic consistently on failure in all allocator
>> > > path that can bail out. E.g. page allocator on GFP_NOWAIT|GFP_NOFAIL
>> > > req. not sure how many more.
>> >
>> > Right, this GFP_NOFAIL issue seems quite messy. However, at least vmalloc
>> > will retry by itself, even if alloc_pages might have failed with
>> > GFP_NOWAIT | GFP_NOFAIL.
>> >
>> > But isn't that the definition of __GFP_NOFAIL?
>> >
>> >  * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
>> >  * cannot handle allocation failures. The allocation could block
>> >  * indefinitely but will never return with failure. Testing for
>> >  * failure is pointless."
>> >
>> > So I believe any code that doesn't retry and ends up returning NULL should be
>> > fixed.
>>
>> Yes, those shouldn't really fail. NOWAIT|NOFAIL was something that
>> should never happen and I really hope it doesn't. Others should really
>> retry but it's been some time since I've checked the last time.
> 
> 
> I assume allocations directly using alloc_pages() might not respect GFP_NOFAIL
> and violate the semantics of GFP_NOFAIL.
> 
> static inline struct page *
> __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>                                                 struct alloc_context *ac) {
>         /*
>          * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
>          * we always retry
>          */
>         if (gfp_mask & __GFP_NOFAIL) {
>                 /*
>                  * All existing users of the __GFP_NOFAIL are blockable, so warn
>                  * of any new users that actually require GFP_NOWAIT
>                  */
>                 if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
>                         goto fail;
> ...
> }
> 
> Additionally, at least drivers/vdpa/vdpa_user/iova_domain.c is
> incorrect with GFP_ATOMIC
> | __GFP_NOFAIL.
> 
> void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain)
> {
>         ...
> 
>         count = domain->bounce_size >> PAGE_SHIFT;
>         for (i = 0; i < count; i++) {
>                 ...
> 
>                 /* Copy user page to kernel page if it's in use */
>                 if (map->orig_phys != INVALID_PHYS_ADDR) {
>                         page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL);

This should be already triggering the warning above? If it doesn't nobody
yet reached the particular line in the alloc slowpath. Probalby thanks to
the GFP_ATOMIC reserves.

Maybe we should tighten the warnigns then.

>                         memcpy_from_page(page_address(page),
>                                          map->bounce_page, 0, PAGE_SIZE);
>                 }
>                 put_page(map->bounce_page);
>                 map->bounce_page = page;
>         }
>         domain->user_bounce_pages = false;
> out:
>         write_unlock(&domain->bounce_lock);
> }
> 
> GFP_NOFAIL things need to be fixed. Let me investigate further.
> 
>>
>> These overflow checks were added without any acks by MM people...
>> --
>> Michal Hocko
>> SUSE Labs
> 
> Thanks
> Barry