Re: [LSF/MM/BPF TOPIC] Removing GFP_NOFS

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 12 Feb 2024 12:20:32 +1100

On Thu, Feb 08, 2024 at 08:55:05PM +0100, Vlastimil Babka (SUSE) wrote:
> On 2/8/24 18:33, Michal Hocko wrote:
> > On Thu 08-02-24 17:02:07, Vlastimil Babka (SUSE) wrote:
> >> On 1/9/24 05:47, Dave Chinner wrote:
> >> > On Thu, Jan 04, 2024 at 09:17:16PM +0000, Matthew Wilcox wrote:
> >> 
> >> Your points and Kent's proposal of scoped GFP_NOWAIT [1] suggests to me this
> >> is no longer FS-only topic as this isn't just about converting to the scoped
> >> apis, but also how they should be improved.
> > 
> > Scoped GFP_NOFAIL context is slightly easier from the semantic POV than
> > scoped GFP_NOWAIT as it doesn't add a potentially unexpected failure
> > mode. It is still tricky to deal with GFP_NOWAIT requests inside the
> > NOFAIL scope because that makes it a non failing busy wait for an
> > allocation if we need to insist on scope NOFAIL semantic. 
> > 
> > On the other hand we can define the behavior similar to what you
> > propose with RETRY_MAYFAIL resp. NORETRY. Existing NOWAIT users should
> > better handle allocation failures regardless of the external allocation
> > scope.
> > 
> > Overriding that scoped NOFAIL semantic with RETRY_MAYFAIL or NORETRY
> > resembles the existing PF_MEMALLOC and GFP_NOMEMALLOC semantic and I do
> > not see an immediate problem with that.
> > 
> > Having more NOFAIL allocations is not great but if you need to
> > emulate those by implementing the nofail semantic outside of the
> > allocator then it is better to have those retries inside the allocator
> > IMO.
> 
> I see potential issues in scoping both the NOWAIT and NOFAIL
> 
> - NOFAIL - I'm assuming Dave is adding __GFP_NOFAIL to xfs allocations or
> adjacent layers where he knows they must not fail for his transaction. But
> could the scope affect also something else underneath that could fail
> without the failure propagating in a way that it affects xfs?

Memory allocaiton failures below the filesystem (i.e. in the IO
path) will fail the IO, and if that happens for a read IO within
a transaction then it will have the same effect as XFS failing a
memory allocation. i.e. it will shut down the filesystem.

The key point here is the moment we go below the filesystem we enter
into a new scoped allocation context with a guaranteed method of
returning errors: NOIO and bio errors.

Once we cross an allocation scope boundary, NOFAIL is no
longer relevant to the code that is being run because there are
other errors that can occur that the filesysetm must handle
that. Hence memory allocation errors just don't matter at this
point, and the NOFAIL constraint is no longer relevant.

Hence we really need to conside NOFAIL differently to NOFS/NOIO.
NOFS/NOIO are about avoiding reclaim recursion deadlocks, so are
relevant all the way down the stack. NOFAIL is only relevant to a
specific subsystem to prevent subsystem allocations from failing,
but as soon as we cross into another subsystem that can (and does)
return errors for memory allocation failures, the NOFAIL context is
no longer relevant.

i.e NOFAIL scopes are not relevant outside the subsystem that sets
it.  Hence we likely need helpers to clear and restore NOFAIL when
we cross an allocation context boundaries. e.g. as we cross from
filesystem to block layer in the IO stack via submit_bio(). Maybe
they should be doing something like:

	nofail_flags = memalloc_nofail_clear();
	noio_flags = memalloc_noio_save();

	....

	memalloc_noio_restore(noio_flags);
	memalloc_nofail_reinstate(nofail_flags);

> Maybe it's a
> high-order allocation with a low-order fallback that really should not be
> __GFP_NOFAIL? We would need to hope it has something like RETRY_MAYFAIL or
> NORETRY already. But maybe it just relies on >costly order being more likely
> to fail implicitly, and those costly orders should be kept excluded from the
> scoped NOFAIL? Maybe __GFP_NOWARN should also override the scoped nofail?

We definitely need NORETRY/RETRY_MAYFAIL to override scoped NOFAIL
at the filesystem layer (e.g. for readahead buffer allocations,
xlog_kvmalloc(), etc to correctly fail fast within XFS
transactions), but I don't think we should force every subsystem to
have to do this just in case a higher level subsystem had a scoped
NOFAIL set for it to work correctly.

> - NOWAIT - as said already, we need to make sure we're not turning an
> allocation that relied on too-small-to-fail into a null pointer exception or
> BUG_ON(!page).

Agreed. NOWAIT is removing allocation failure constraints and I
don't think that can be made to work reliably. Error injection
cannot prove the absence of errors  and so we can never be certain
the code will always operate correctly and not crash when an
unexepected allocation failure occurs.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx