Re: [PATCH v3 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL

Barry Song <21cnbao@xxxxxxxxx> · Mon, 19 Aug 2024 21:39:28 +1200

On Mon, Aug 19, 2024 at 9:25 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Mon, Aug 19, 2024 at 3:50 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > On Sun 18-08-24 10:55:09, Yafang Shao wrote:
> > > On Sat, Aug 17, 2024 at 2:25 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
> > > >
> > > > From: Barry Song <v-songbaohua@xxxxxxxx>
> > > >
> > > > When users allocate memory with the __GFP_NOFAIL flag, they might
> > > > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc.  This kind of
> > > > non-blockable __GFP_NOFAIL is not supported and is pointless.  If we
> > > > attempt and still fail to allocate memory for these users, we have two
> > > > choices:
> > > >
> > > >     1. We could busy-loop and hope that some other direct reclamation or
> > > >     kswapd rescues the current process. However, this is unreliable
> > > >     and could ultimately lead to hard or soft lockups,
> > >
> > > That can occur even if we set both __GFP_NOFAIL and
> > > __GFP_DIRECT_RECLAIM, right?
> >
> > No, it cannot! With __GFP_DIRECT_RECLAIM the allocator might take a long
> > time to satisfy the allocation but it will reclaim to get the memory, it
> > will sleep if necessary and it will will trigger OOM killer if there is
> > no other option. __GFP_DIRECT_RECLAIM is a completely different story
> > than without it which means _no_sleeping_ is allowed and therefore only
> > a busy loop waiting for the allocation to proceed is allowed.
>
> That could be a livelock.
> From the user's perspective, there's no noticeable difference between
> a livelock, soft lockup, or hard lockup.

This is certainly different. A lockup occurs when tasks can't be scheduled,
causing the entire system to stop functioning.

>
> >
> > > So, I don't believe the issue is related
> > > to setting __GFP_DIRECT_RECLAIM; rather, it stems from the flawed
> > > design of __GFP_NOFAIL itself.
> >
> > Care to elaborate?
>
> I've read the documentation explaining why the busy loop is embedded
> within the page allocation process instead of letting users implement
> it based on their needs. However, the complexity and numerous issues
> suggest that this design might be fundamentally flawed.

I don't see "numerous issues", only two issues:

1. allocation size overflow with __GFP_NOFAIL
2. unsupported case: __GFP_NOWAIT/ATOMIC | __GFP_NOFAIL.

for 1, it has been a BUG to require an overflowed size to always succeed.

for 2,  it is an unsupported case. we just need to hide __GFP_NOFAIL
and only expose GFP_NOFAIL(which definitely includes blockable) so
any unsupported case like vdpa will no longer occur.  I would greatly
appreciate it if you or someone else could take over this task, as I am
currently extremely busy.

>
> --
> Regards
> Yafang

Thanks
Barry