Re: [PATCH] mm/vmalloc: do not output a spurious warning when huge vmalloc() fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 06, 2023 at 10:17:02AM +0200, Uladzislau Rezki wrote:
> On Tue, Jun 06, 2023 at 09:13:24AM +0200, Vlastimil Babka wrote:
> >
> > On 6/5/23 22:11, Lorenzo Stoakes wrote:
> > > In __vmalloc_area_node() we always warn_alloc() when an allocation
> > > performed by vm_area_alloc_pages() fails unless it was due to a pending
> > > fatal signal.
> > >
> > > However, huge page allocations instigated either by vmalloc_huge() or
> > > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or
> > > kvmalloc_node()) always falls back to order-0 allocations if the huge page
> > > allocation fails.
> > >
> > > This renders the warning useless and noisy, especially as all callers
> > > appear to be aware that this may fallback. This has already resulted in at
> > > least one bug report from a user who was confused by this (see link).
> > >
> > > Therefore, simply update the code to only output this warning for order-0
> > > pages when no fatal signal is pending.
> > >
> > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410
> > > Signed-off-by: Lorenzo Stoakes <lstoakes@xxxxxxxxx>
> >
> > I think there are more reports of same thing from the btrfs context, that
> > appear to be a 6.3 regression
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=217466
> > Link: https://lore.kernel.org/all/efa04d56-cd7f-6620-bca7-1df89f49bf4b@xxxxxxxxx/
> >
> I had a look at that report. The btrfs complains due to the
> fact that a high-order page(1 << 9) can not be obtained. In the
> vmalloc code we do not fall to 0-order allocator if there is
> a request of getting a high-order.

This isn't true, we _do_ fallback to order-0 (this is the basis of my patch), in
__vmalloc_node_range():-

	/* Allocate physical pages and map them into vmalloc space. */
	ret = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
	if (!ret)
		goto fail;

...

fail:
	if (shift > PAGE_SHIFT) {
		shift = PAGE_SHIFT;
		align = real_align;
		size = real_size;
		goto again;
	}

With the order being derived from shift, and __vmalloc_area_node() only being
called from __vmalloc_node_range().

>
> I provided a patch to fallback if a high-order. A reproducer, after
> applying the patch, started to get oppses in another places.
>
> IMO, we should fallback even for high-order requests. Because it is
> highly likely it can not be accomplished.
>
> Any thoughts?
>
> <snip>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 31ff782d368b..7a06452f7807 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>                         page = alloc_pages(alloc_gfp, order);
>                 else
>                         page = alloc_pages_node(nid, alloc_gfp, order);
> +
>                 if (unlikely(!page)) {
> -                       if (!nofail)
> -                               break;
> +                       if (nofail)
> +                               alloc_gfp |= __GFP_NOFAIL;
>
> -                       /* fall back to the zero order allocations */
> -                       alloc_gfp |= __GFP_NOFAIL;
> -                       order = 0;
> -                       continue;
> +                       /* Fall back to the zero order allocations. */
> +                       if (order || nofail) {
> +                               order = 0;
> +                               continue;
> +                       }
> +
> +                       break;
>                 }
>
>                 /*
> <snip>
>
>
>
> --
> Uladzislau Rezki

I saw that, it seems to be duplicating the same thing as the original fallback
code is (which was originally designed to permit higher order non-__GFP_NOFAIL
allocations before trying order-0 __GFP_NOFAIL).

I don't think it is really useful to change this as it confuses that logic and
duplicates something we already do.

Honestly though moreover I think this whole area needs some refactoring.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux