Re: [RFC] xfs: reduce sub-block DIO serialisation

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 18 Jan 2021 08:34:01 +1100

On Thu, Jan 14, 2021 at 08:48:36AM +0200, Avi Kivity wrote:
> On 1/13/21 10:38 PM, Dave Chinner wrote:
> > On Wed, Jan 13, 2021 at 10:00:37AM +0200, Avi Kivity wrote:
> > > On 1/13/21 12:13 AM, Dave Chinner wrote:
> > > > On Tue, Jan 12, 2021 at 10:01:35AM +0200, Avi Kivity wrote:
> > > > > On 1/12/21 3:07 AM, Dave Chinner wrote:
> > > > > > Hi folks,
> > > > > > 
> > > > > > This is the XFS implementation on the sub-block DIO optimisations
> > > > > > for written extents that I've mentioned on #xfs and a couple of
> > > > > > times now on the XFS mailing list.
> > > > > > 
> > > > > > It takes the approach of using the IOMAP_NOWAIT non-blocking
> > > > > > IO submission infrastructure to optimistically dispatch sub-block
> > > > > > DIO without exclusive locking. If the extent mapping callback
> > > > > > decides that it can't do the unaligned IO without extent
> > > > > > manipulation, sub-block zeroing, blocking or splitting the IO into
> > > > > > multiple parts, it aborts the IO with -EAGAIN. This allows the high
> > > > > > level filesystem code to then take exclusive locks and resubmit the
> > > > > > IO once it has guaranteed no other IO is in progress on the inode
> > > > > > (the current implementation).
> > > > > Can you expand on the no-splitting requirement? Does it involve only
> > > > > splitting by XFS (IO spans >1 extents) or lower layers (RAID)?
> > > > XFS only.
> > > 
> > > Ok, that is somewhat under control as I can provide an extent hint, and wish
> > > really hard that the filesystem isn't fragmented.
> > > 
> > > 
> > > > > The reason I'm concerned is that it's the constraint that the application
> > > > > has least control over. I guess I could use RWF_NOWAIT to avoid blocking my
> > > > > main thread (but last time I tried I'd get occasional EIOs that frightened
> > > > > me off that).
> > > > Spurious EIO from RWF_NOWAIT is a bug that needs to be fixed. DO you
> > > > have any details?
> > > > 
> > > I reported it in [1]. It's long since gone since I disabled RWF_NOWAIT. It
> > > was relatively rare, sometimes happening in continuous integration runs that
> > > take hours, and sometimes not.
> > > 
> > > 
> > > I expect it's fixed by now since io_uring relies on it. Maybe I should turn
> > > it on for kernels > some_random_version.
> > > 
> > > 
> > > [1] https://lore.kernel.org/lkml/9bab0f40-5748-f147-efeb-5aac4fd44533@xxxxxxxxxxxx/t/#u
> > Yeah, as I thought. Usage of REQ_NOWAIT with filesystem based IO is
> > simply broken - it causes spurious IO failures to be reported to IO
> > completion callbacks and so are very difficult to track and/or
> > retry. iomap does not use REQ_NOWAIT at all, so you should not ever
> > see this from XFS or ext4 DIO anymore...
> 
> What kernel version would be good?

For ext4? >= 5.5 was when it was converted to the iomap DIO path
should be safe.  Before taht it would use the old DIO path which
sets REQ_NOWAIT when IOCB_NOWAIT (i.e. RWF_NOWAIT) was set for the
IO.

Btrfs is an even more recent convert to iomap-based dio (5.9?).

The REQ_NOWAIT behaviour was introduced into the old DIO path back
in 4.13 by commit 03a07c92a9ed ("block: return on congested block
device") and was intended to support RWF_NOWAIT on raw block
devices.  Hence it was not added to the iomap path as block devices
don't use that path.

Other examples of how REQ_NOWAIT breaks filesystems was a io_uring
hack to force REQ_NOWAIT IO behaviour through filesystems via
"nowait block plugs" resulted in XFS filesystem shutdowns because
of unexpected IO errors during journal writes:

https://lore.kernel.org/linux-xfs/20200915113327.GA1554921@bfoster/

There have been patches proposed to add REQ_NOWAIT to the iomap DIO
code proporsed, but they've all been NACKed because of the fact it
will break filesystem-based RWF_NOWAIT DIO.

So, long story short: On XFS you are fine on all kernels. On all
other block based filesystems you need <4.13, except for ext4 where
>= 5.5 and btrfs where >=5.9 will work correctly.

> commit 4503b7676a2e0abe69c2f2c0d8b03aec53f2f048
> Author: Jens Axboe <axboe@xxxxxxxxx>
> Date:   Mon Jun 1 10:00:27 2020 -0600
> 
>     io_uring: catch -EIO from buffered issue request failure
> 
>     -EIO bubbles up like -EAGAIN if we fail to allocate a request at the
>     lower level. Play it safe and treat it like -EAGAIN in terms of sync
>     retry, to avoid passing back an errant -EIO.
> 
>     Catch some of these early for block based file, as non-mq devices
>     generally do not support NOWAIT. That saves us some overhead by
>     not first trying, then retrying from async context. We can go straight
>     to async punt instead.
> 
>     Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> 
> but this looks to be io_uring specific fix (somewhat frightening too), not
> removal of REQ_NOWAIT.

That looks like a similar case to the one I mention above where
io_uring and REQ_NOWAIT aren't playing well with others....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx