On Thu, Jan 14, 2021 at 08:48:36AM +0200, Avi Kivity wrote: > On 1/13/21 10:38 PM, Dave Chinner wrote: > > On Wed, Jan 13, 2021 at 10:00:37AM +0200, Avi Kivity wrote: > > > On 1/13/21 12:13 AM, Dave Chinner wrote: > > > > On Tue, Jan 12, 2021 at 10:01:35AM +0200, Avi Kivity wrote: > > > > > On 1/12/21 3:07 AM, Dave Chinner wrote: > > > > > > Hi folks, > > > > > > > > > > > > This is the XFS implementation on the sub-block DIO optimisations > > > > > > for written extents that I've mentioned on #xfs and a couple of > > > > > > times now on the XFS mailing list. > > > > > > > > > > > > It takes the approach of using the IOMAP_NOWAIT non-blocking > > > > > > IO submission infrastructure to optimistically dispatch sub-block > > > > > > DIO without exclusive locking. If the extent mapping callback > > > > > > decides that it can't do the unaligned IO without extent > > > > > > manipulation, sub-block zeroing, blocking or splitting the IO into > > > > > > multiple parts, it aborts the IO with -EAGAIN. This allows the high > > > > > > level filesystem code to then take exclusive locks and resubmit the > > > > > > IO once it has guaranteed no other IO is in progress on the inode > > > > > > (the current implementation). > > > > > Can you expand on the no-splitting requirement? Does it involve only > > > > > splitting by XFS (IO spans >1 extents) or lower layers (RAID)? > > > > XFS only. > > > > > > Ok, that is somewhat under control as I can provide an extent hint, and wish > > > really hard that the filesystem isn't fragmented. > > > > > > > > > > > The reason I'm concerned is that it's the constraint that the application > > > > > has least control over. I guess I could use RWF_NOWAIT to avoid blocking my > > > > > main thread (but last time I tried I'd get occasional EIOs that frightened > > > > > me off that). > > > > Spurious EIO from RWF_NOWAIT is a bug that needs to be fixed. DO you > > > > have any details? > > > > > > > I reported it in [1]. It's long since gone since I disabled RWF_NOWAIT. It > > > was relatively rare, sometimes happening in continuous integration runs that > > > take hours, and sometimes not. > > > > > > > > > I expect it's fixed by now since io_uring relies on it. Maybe I should turn > > > it on for kernels > some_random_version. > > > > > > > > > [1] https://lore.kernel.org/lkml/9bab0f40-5748-f147-efeb-5aac4fd44533@xxxxxxxxxxxx/t/#u > > Yeah, as I thought. Usage of REQ_NOWAIT with filesystem based IO is > > simply broken - it causes spurious IO failures to be reported to IO > > completion callbacks and so are very difficult to track and/or > > retry. iomap does not use REQ_NOWAIT at all, so you should not ever > > see this from XFS or ext4 DIO anymore... > > What kernel version would be good? For ext4? >= 5.5 was when it was converted to the iomap DIO path should be safe. Before taht it would use the old DIO path which sets REQ_NOWAIT when IOCB_NOWAIT (i.e. RWF_NOWAIT) was set for the IO. Btrfs is an even more recent convert to iomap-based dio (5.9?). The REQ_NOWAIT behaviour was introduced into the old DIO path back in 4.13 by commit 03a07c92a9ed ("block: return on congested block device") and was intended to support RWF_NOWAIT on raw block devices. Hence it was not added to the iomap path as block devices don't use that path. Other examples of how REQ_NOWAIT breaks filesystems was a io_uring hack to force REQ_NOWAIT IO behaviour through filesystems via "nowait block plugs" resulted in XFS filesystem shutdowns because of unexpected IO errors during journal writes: https://lore.kernel.org/linux-xfs/20200915113327.GA1554921@bfoster/ There have been patches proposed to add REQ_NOWAIT to the iomap DIO code proporsed, but they've all been NACKed because of the fact it will break filesystem-based RWF_NOWAIT DIO. So, long story short: On XFS you are fine on all kernels. On all other block based filesystems you need <4.13, except for ext4 where >= 5.5 and btrfs where >=5.9 will work correctly. > commit 4503b7676a2e0abe69c2f2c0d8b03aec53f2f048 > Author: Jens Axboe <axboe@xxxxxxxxx> > Date: Mon Jun 1 10:00:27 2020 -0600 > > io_uring: catch -EIO from buffered issue request failure > > -EIO bubbles up like -EAGAIN if we fail to allocate a request at the > lower level. Play it safe and treat it like -EAGAIN in terms of sync > retry, to avoid passing back an errant -EIO. > > Catch some of these early for block based file, as non-mq devices > generally do not support NOWAIT. That saves us some overhead by > not first trying, then retrying from async context. We can go straight > to async punt instead. > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > but this looks to be io_uring specific fix (somewhat frightening too), not > removal of REQ_NOWAIT. That looks like a similar case to the one I mention above where io_uring and REQ_NOWAIT aren't playing well with others.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx