On Tue, Jul 23, 2019 at 09:20:05AM -0600, Jens Axboe wrote: > On 7/23/19 2:07 AM, Stefan Hajnoczi wrote: > > Hi, > > io_uring O_DIRECT writes can fail with EIO on ext4. Please see the > > function graph trace from Linux 5.3.0-rc1 below for details. It was > > produced with the following qemu-io command (using Aarushi's QEMU > > patches from https://github.com/rooshm/qemu/commits/io_uring): > > > > $ qemu-io --cache=none --aio=io_uring --format=qcow2 -c 'writev -P 185 131072 65536' tests/qemu-iotests/scratch/test.qcow2 > > > > This issue is specific to ext4. XFS and the underlying LVM logical > > volume both work. > > > > The storage configuration is an LVM logical volume (device-mapper linear > > target), on top of LUKS, on top of a SATA disk. The logical volume's > > request_queue does not have mq_ops and this causes > > generic_make_request_checks() to fail: > > > > if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q)) > > goto not_supported; > > > > I guess this could be worked around by deferring the request to the > > io_uring work queue to avoid REQ_NOWAIT. But XFS handles this fine so > > how can io_uring.c detect this case cleanly or is there a bug in ext4? > > I actually think it's XFS that's broken here, it's not passing down > the IOCB_NOWAIT -> IOMAP_NOWAIT -> REQ_NOWAIT. This means we lose that > important request bit, and we just block instead of triggering the > not_supported case. I wouldn't say XFS is broken, we didn't implement it because it meant that IOCB_NOWAIT did not work on all block devices. i.e. the biggest issue IOCB_NOWAIT is avoiding is blocking on filesytem locks, and blocking in the request queue was largely just noise for the applications RWF_NOWAIT was initially implemented for. IOWs, if you have the wrong hardware, you can't use RWF_NOWAIT at all, despite it providing massive benefits for AIO at the filesystem level. Hence to say how IOMAP_NOWAIT is implemented (i.e. does not set REQ_NOWAIT) is broken ignores the fact that RWF_NOWAIT was originally intended as a "don't block on contended filesystem locks" directive, not as something that is conditional on block layer functionality... > Outside of that, that case needs similar treatment to what I did for > the EAGAIN case here: > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=893a1c97205a3ece0cbb3f571a3b972080f3b4c7 I don't see REQ_NOWAIT_INLINE in 5.3-rc1. However, nobody checks the cookie returned by submit_bio() for error status. It's only a recent addition for block polling and so the only time it is checked is if we are polling and it gets passed to blk_poll when RWF_HIPRI is set. So this change, by itself, doesn't solve any problem. In fact, the way the direct IO code is right now a multi-bio DIO submission will overwrite the submit cookie repeatedly and hence may end up only doing partial submission but still report success because the last bio in the chain didn't block and REQ_NOWAIT_INLINE doesn't actually mark the bio itself with an error, so the bio completion function won't report it, either. > It was a big mistake to pass back these values in an async fashion, IMO, the big mistake was to have only some block device configurations support REQ_NOWAIT - that was an expedient hack to get block layer polling into the kernel fast. The way the error is passed back is largely irrelevant from that perspective, and REQ_NOWAIT_INLINE doesn't resolve this problem at all. Indeed, I think REQ_NOWAIT is largely redundant, because if we care about IO submission blocking because the request queue is full, then we simply use the existing bdi_congested() interface to check. That works for all types of block devices - not just random mq devices - and matches code we have all over the kernel to avoid blocking async IO submission on congested reuqest queues... So, yeah, I think REQ_NOWAIT needs to die and the direct IO callers should do just congestion checks on IOCB_NOWAIT/IOMAP_NOWAIT rather than try to add new error reporting mechanisms into bios that lots of code will need to be changed to support.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx