Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 15 Aug 2024 12:54:17 +1000

On Wed, Aug 14, 2024 at 03:32:26PM +0800, Yafang Shao wrote:
> On Wed, Aug 14, 2024 at 1:42 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Aug 14, 2024 at 10:19:36AM +0800, Yafang Shao wrote:
> > > On Wed, Aug 14, 2024 at 8:28 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > > > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > > > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > > > > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > > > > will be useful in scenarios where we want to avoid waiting for memory
> > > > > reclamation.
> > > >
> > > > Readahead already uses this context:
> > > >
> > > > static inline gfp_t readahead_gfp_mask(struct address_space *x)
> > > > {
> > > >         return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
> > > > }
> > > >
> > > > and __GFP_NORETRY means minimal direct reclaim should be performed.
> > > > Most filesystems already have GFP_NOFS context from
> > > > mapping_gfp_mask(), so how much difference does completely avoiding
> > > > direct reclaim actually make under memory pressure?
> > >
> > > Besides the __GFP_NOFS , ~__GFP_DIRECT_RECLAIM also implies
> > > __GPF_NOIO. If we don't set __GPF_NOIO, the readahead can wait for IO,
> > > right?
> >
> > There's a *lot* more difference between __GFP_NORETRY and
> > __GFP_NOWAIT than just __GFP_NOIO. I don't need you to try to
> > describe to me what the differences are; What I'm asking you is this:
> >
> > > > i.e. doing some direct reclaim without blocking when under memory
> > > > pressure might actually give better performance than skipping direct
> > > > reclaim and aborting readahead altogether....
> > > >
> > > > This really, really needs some numbers (both throughput and IO
> > > > latency histograms) to go with it because we have no evidence either
> > > > way to determine what is the best approach here.
> >
> > Put simply: does the existing readahead mechanism give better results
> > than the proposed one, and if so, why wouldn't we just reenable
> > readahead unconditionally instead of making it behave differently
> > for this specific case?
> 
> Are you suggesting we compare the following change with the current proposal?
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index fd34b5755c0b..ced74b1b350d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3455,7 +3455,6 @@ static inline int kiocb_set_rw_flags(struct
> kiocb *ki, rwf_t flags,
>         if (flags & RWF_NOWAIT) {
>                 if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
>                         return -EOPNOTSUPP;
> -               kiocb_flags |= IOCB_NOIO;
>         }
>         if (flags & RWF_ATOMIC) {
>                 if (rw_type != WRITE)

Yes.

> Doesn't unconditional readahead break the semantics of RWF_NOWAIT,
> which is supposed to avoid waiting for I/O? For example, it might
> trigger a pageout for a dirty page.

Yes, but only for *some filesystems* in *some configurations*.
Readahead allocation behaviour is specifically controlled by the gfp
mask set on the mapping by the filesystem at inode instantiation
time. i.e. via a call to mapping_set_gfp_mask().

XFS, for one, always clears __GFP_FS from this mask, and several
other filesystems set it to GFP_NOFS. Filesystems that do this will
not do pageout for a dirty page during memory allocation.

Further, memory reclaim can not write dirty pages to a filesystem
without a ->writepage implementation. ->writepage is almost
completely gone - neither ext4, btrfs or XFS have a ->writepage
implementation anymore - with f2fs being the only "major" filesystem
with a ->writepage implementation remaining.

IOWs, for most readahead cases right now, direct memory reclaim will
not issue writeback IO on dirty cached file pages and in the near
future that will change to -never-.

That means the only IO that direct reclaim will be able to do is for
swapping and compaction. Both of these can be prevented simply by
setting a GFP_NOIO allocation context. IOWs, in the not-to-distant
future we won't have to turn direct reclaim off to prevent IO from
and blocking in direct reclaim during readahead - GFP_NOIO context
will be all that is necessary for IOCB_NOWAIT readahead.

That's why I'm asking if just doing readahead as it stands from
RWF_NOWAIT causes any obvious problems. I think we really only need
need GFP_NOIO | __GFP_NORETRY allocation context for NOWAIT
readahead IO, and that's something we already have a context API
for.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx