[PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 16 Jan 2024 09:59:38 +1100

This series does two things. Firstly it removes the remaining XFS
specific kernel memory allocation wrappers, converting everything to
using GFP flags directly. Secondly, it converts all the GFP_NOFS
flag usage to use the scoped memalloc_nofs_save() API instead of
direct calls with the GFP_NOFS.

The first part of the series (fs/xfs/kmem.[ch] removal) is straight
forward.  We've done lots of this stuff in the past leading up to
the point; this is just converting the final remaining usage to the
native kernel interface. The only down-side to this is that we end
up propagating __GFP_NOFAIL everywhere into the code. This is no big
deal for XFS - it's just formalising the fact that all our
allocations are __GFP_NOFAIL by default, except for the ones we
explicity mark as able to fail. This may be a surprise of people
outside XFS, but we've been doing this for a couple of decades now
and the sky hasn't fallen yet.

The second part of the series is more involved - in most cases
GFP_NOFS is redundant because we are already in a scoped NOFS
context (e.g. transactions) so the conversion to GFP_KERNEL isn't a
huge issue.

However, there are some code paths where we have used GFP_NOFS to
prevent lockdep warnings because the code is called from both
GFP_KERNEL and GFP_NOFS contexts and so lockdep gets confused when
it has tracked code as GFP_NOFS and then sees it enter direct
reclaim, recurse into the filesystem and take fs locks from the
GFP_KERNEL caller. There are a couple of other lockdep false
positive paths that can occur that we've shut up with GFP_NOFS, too.
More recently, we've been using the __GFP_NOLOCKDEP flag to signal
this "lockdep gives false positives here" condition, so one of the
things this patchset does is convert all the GFP_NOFS calls in code
that can be run from both GFP_KERNEL and GFP_NOFS contexts, and/or
run both above and below reclaim with GFP_KERNEL | __GFP_NOLOCKDEP.

This means that some allocations have gone from having KM_NOFS tags
to having GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL. There is an
increase in verbosity here, but the first step in cleaning all this
mess up is consistently annotating all the allocation sites with the
correct tags.

Later in the patchset, we start adding new scoped NOFS contexts to
cover cases where we really need NOFS but rely on code being called
to understand that it is actually in a NOFS context. And example of
this is intent recovery - allocating the intent structure occurs
outside transaction scope, but still needs to be NOFS scope because
of all the pending work already queued. The rest of the work is done
under transaction context, giving it NOFS context, but these initial
allocations aren't inside that scope. IOWs, the entire intent
recovery scope should really be covered by a single NOFS context.
The patch set ends up putting the entire second phase of recovery
(intents, unlnked list, reflink cleanup) under a single NOFS context
because we really don't want reclaim to operate on the filesystem
whilst we are performing these operations. Hence a single high level
NOFS scope is appropriate here.

The end result is that GFP_NOFS is completely gone from XFS,
replaced by correct annotations and more widely deployed scoped
allocation contexts. This passes fstests with lockdep, KASAN and
other debuggin enabled without any regressions or new lockdep false
positives.

Comments, thoughts and ideas?

----

Version 1:
- based on v6.7 + linux-xfs/for-next