Re: [RFC] xfs: reduce sub-block DIO serialisation

Avi Kivity <avi@xxxxxxxxxxxx> · Thu, 14 Jan 2021 08:48:36 +0200

On 1/13/21 10:38 PM, Dave Chinner wrote:
On Wed, Jan 13, 2021 at 10:00:37AM +0200, Avi Kivity wrote:
On 1/13/21 12:13 AM, Dave Chinner wrote:
On Tue, Jan 12, 2021 at 10:01:35AM +0200, Avi Kivity wrote:
On 1/12/21 3:07 AM, Dave Chinner wrote:
Hi folks,

This is the XFS implementation on the sub-block DIO optimisations
for written extents that I've mentioned on #xfs and a couple of
times now on the XFS mailing list.

It takes the approach of using the IOMAP_NOWAIT non-blocking
IO submission infrastructure to optimistically dispatch sub-block
DIO without exclusive locking. If the extent mapping callback
decides that it can't do the unaligned IO without extent
manipulation, sub-block zeroing, blocking or splitting the IO into
multiple parts, it aborts the IO with -EAGAIN. This allows the high
level filesystem code to then take exclusive locks and resubmit the
IO once it has guaranteed no other IO is in progress on the inode
(the current implementation).
Can you expand on the no-splitting requirement? Does it involve only
splitting by XFS (IO spans >1 extents) or lower layers (RAID)?
XFS only.

Ok, that is somewhat under control as I can provide an extent hint, and wish
really hard that the filesystem isn't fragmented.

The reason I'm concerned is that it's the constraint that the application
has least control over. I guess I could use RWF_NOWAIT to avoid blocking my
main thread (but last time I tried I'd get occasional EIOs that frightened
me off that).
Spurious EIO from RWF_NOWAIT is a bug that needs to be fixed. DO you
have any details?

I reported it in [1]. It's long since gone since I disabled RWF_NOWAIT. It
was relatively rare, sometimes happening in continuous integration runs that
take hours, and sometimes not.

I expect it's fixed by now since io_uring relies on it. Maybe I should turn
it on for kernels > some_random_version.

[1] https://lore.kernel.org/lkml/9bab0f40-5748-f147-efeb-5aac4fd44533@xxxxxxxxxxxx/t/#u
Yeah, as I thought. Usage of REQ_NOWAIT with filesystem based IO is
simply broken - it causes spurious IO failures to be reported to IO
completion callbacks and so are very difficult to track and/or
retry. iomap does not use REQ_NOWAIT at all, so you should not ever
see this from XFS or ext4 DIO anymore...

What kernel version would be good?

Searching the log I found

commit 4503b7676a2e0abe69c2f2c0d8b03aec53f2f048
Author: Jens Axboe <axboe@xxxxxxxxx>
Date:   Mon Jun 1 10:00:27 2020 -0600

    io_uring: catch -EIO from buffered issue request failure

    -EIO bubbles up like -EAGAIN if we fail to allocate a request at the
    lower level. Play it safe and treat it like -EAGAIN in terms of sync
    retry, to avoid passing back an errant -EIO.

    Catch some of these early for block based file, as non-mq devices
    generally do not support NOWAIT. That saves us some overhead by
    not first trying, then retrying from async context. We can go straight
    to async punt instead.

    Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

but this looks to be io_uring specific fix (somewhat frightening too), 
not removal of REQ_NOWAIT.