Re: [PATCH v3 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL

Yafang Shao <laoar.shao@xxxxxxxxx> · Sun, 18 Aug 2024 10:55:09 +0800

On Sat, Aug 17, 2024 at 2:25 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
>
> From: Barry Song <v-songbaohua@xxxxxxxx>
>
> When users allocate memory with the __GFP_NOFAIL flag, they might
> incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc.  This kind of
> non-blockable __GFP_NOFAIL is not supported and is pointless.  If we
> attempt and still fail to allocate memory for these users, we have two
> choices:
>
>     1. We could busy-loop and hope that some other direct reclamation or
>     kswapd rescues the current process. However, this is unreliable
>     and could ultimately lead to hard or soft lockups,

That can occur even if we set both __GFP_NOFAIL and
__GFP_DIRECT_RECLAIM, right? So, I don't believe the issue is related
to setting __GFP_DIRECT_RECLAIM; rather, it stems from the flawed
design of __GFP_NOFAIL itself.

> which might not
>     be well supported by some architectures.
>
>     2. We could use BUG_ON to trigger a reliable system crash, avoiding
>     exposing NULL dereference.
>
> Neither option is ideal, but both are improvements over the existing code.
> This patch selects the second option because, with the introduction of
> scoped API and GFP_NOFAIL—capable of enforcing direct reclamation for
> nofail users(which is in my plan), non-blockable nofail allocations will
> no longer be possible.
>
> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> Cc: Christoph Lameter <cl@xxxxxxxxx>
> Cc: Pekka Enberg <penberg@xxxxxxxxxx>
> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
> Cc: Vlastimil Babka <vbabka@xxxxxxx>
> Cc: Roman Gushchin <roman.gushchin@xxxxxxxxx>
> Cc: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Kees Cook <kees@xxxxxxxxxx>
> Cc: "Eugenio Pérez" <eperezma@xxxxxxxxxx>
> Cc: Hailong.Liu <hailong.liu@xxxxxxxx>
> Cc: Jason Wang <jasowang@xxxxxxxxxx>
> Cc: Maxime Coquelin <maxime.coquelin@xxxxxxxxxx>
> Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx>
> Cc: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
> ---
>  mm/page_alloc.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d2c37f8f8d09..fb5850ecd3ae 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4399,11 +4399,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>          */
>         if (gfp_mask & __GFP_NOFAIL) {
>                 /*
> -                * All existing users of the __GFP_NOFAIL are blockable, so warn
> -                * of any new users that actually require GFP_NOWAIT
> +                * All existing users of the __GFP_NOFAIL are blockable
> +                * otherwise we introduce a busy loop with inside the page
> +                * allocator from non-sleepable contexts
>                  */
> -               if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> -                       goto fail;
> +               BUG_ON(!can_direct_reclaim);

I'm not in favor of using BUG_ON() here, as many call sites already
handle the return value of __GFP_NOFAIL.

If we believe BUG_ON() is necessary, why not place it at the beginning
of __alloc_pages_slowpath() instead of after numerous operations,
which could potentially obscure the issue?

>
>                 /*
>                  * PF_MEMALLOC request from this context is rather bizarre
> @@ -4434,7 +4434,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>                 cond_resched();
>                 goto retry;
>         }
> -fail:
> +
>         warn_alloc(gfp_mask, ac->nodemask,
>                         "page allocation failure: order:%u", order);
>  got_pg:
> --
> 2.39.3 (Apple Git-146)
>
>

--
Regards
Yafang