Re: [RFC] xfs: reduce sub-block DIO serialisation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/17/21 11:34 PM, Dave Chinner wrote:
On Thu, Jan 14, 2021 at 08:48:36AM +0200, Avi Kivity wrote:
On 1/13/21 10:38 PM, Dave Chinner wrote:
On Wed, Jan 13, 2021 at 10:00:37AM +0200, Avi Kivity wrote:
On 1/13/21 12:13 AM, Dave Chinner wrote:
On Tue, Jan 12, 2021 at 10:01:35AM +0200, Avi Kivity wrote:
On 1/12/21 3:07 AM, Dave Chinner wrote:
Hi folks,

This is the XFS implementation on the sub-block DIO optimisations
for written extents that I've mentioned on #xfs and a couple of
times now on the XFS mailing list.

It takes the approach of using the IOMAP_NOWAIT non-blocking
IO submission infrastructure to optimistically dispatch sub-block
DIO without exclusive locking. If the extent mapping callback
decides that it can't do the unaligned IO without extent
manipulation, sub-block zeroing, blocking or splitting the IO into
multiple parts, it aborts the IO with -EAGAIN. This allows the high
level filesystem code to then take exclusive locks and resubmit the
IO once it has guaranteed no other IO is in progress on the inode
(the current implementation).
Can you expand on the no-splitting requirement? Does it involve only
splitting by XFS (IO spans >1 extents) or lower layers (RAID)?
XFS only.
Ok, that is somewhat under control as I can provide an extent hint, and wish
really hard that the filesystem isn't fragmented.


The reason I'm concerned is that it's the constraint that the application
has least control over. I guess I could use RWF_NOWAIT to avoid blocking my
main thread (but last time I tried I'd get occasional EIOs that frightened
me off that).
Spurious EIO from RWF_NOWAIT is a bug that needs to be fixed. DO you
have any details?

I reported it in [1]. It's long since gone since I disabled RWF_NOWAIT. It
was relatively rare, sometimes happening in continuous integration runs that
take hours, and sometimes not.


I expect it's fixed by now since io_uring relies on it. Maybe I should turn
it on for kernels > some_random_version.


[1] https://lore.kernel.org/lkml/9bab0f40-5748-f147-efeb-5aac4fd44533@xxxxxxxxxxxx/t/#u
Yeah, as I thought. Usage of REQ_NOWAIT with filesystem based IO is
simply broken - it causes spurious IO failures to be reported to IO
completion callbacks and so are very difficult to track and/or
retry. iomap does not use REQ_NOWAIT at all, so you should not ever
see this from XFS or ext4 DIO anymore...
What kernel version would be good?
For ext4? >= 5.5 was when it was converted to the iomap DIO path
should be safe.  Before taht it would use the old DIO path which
sets REQ_NOWAIT when IOCB_NOWAIT (i.e. RWF_NOWAIT) was set for the
IO.

Btrfs is an even more recent convert to iomap-based dio (5.9?).

The REQ_NOWAIT behaviour was introduced into the old DIO path back
in 4.13 by commit 03a07c92a9ed ("block: return on congested block
device") and was intended to support RWF_NOWAIT on raw block
devices.  Hence it was not added to the iomap path as block devices
don't use that path.

Other examples of how REQ_NOWAIT breaks filesystems was a io_uring
hack to force REQ_NOWAIT IO behaviour through filesystems via
"nowait block plugs" resulted in XFS filesystem shutdowns because
of unexpected IO errors during journal writes:

https://lore.kernel.org/linux-xfs/20200915113327.GA1554921@bfoster/

There have been patches proposed to add REQ_NOWAIT to the iomap DIO
code proporsed, but they've all been NACKed because of the fact it
will break filesystem-based RWF_NOWAIT DIO.

So, long story short: On XFS you are fine on all kernels. On all
other block based filesystems you need <4.13, except for ext4 where
= 5.5 and btrfs where >=5.9 will work correctly.


My report mentions XFS though it was so long ago I'm willing to treat it as measurement error. I'll incorporate these numbers into the code, and we'll see. Luckily I was already forced to have filesystem specific code so the ugliness is already there.






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux