Re: EIO with io_uring O_DIRECT writes on ext4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/23/19 2:07 AM, Stefan Hajnoczi wrote:
> Hi,
> io_uring O_DIRECT writes can fail with EIO on ext4.  Please see the
> function graph trace from Linux 5.3.0-rc1 below for details.  It was
> produced with the following qemu-io command (using Aarushi's QEMU
> patches from https://github.com/rooshm/qemu/commits/io_uring):
> 
>    $ qemu-io --cache=none --aio=io_uring --format=qcow2 -c 'writev -P 185 131072 65536' tests/qemu-iotests/scratch/test.qcow2
> 
> This issue is specific to ext4.  XFS and the underlying LVM logical
> volume both work.
> 
> The storage configuration is an LVM logical volume (device-mapper linear
> target), on top of LUKS, on top of a SATA disk.  The logical volume's
> request_queue does not have mq_ops and this causes
> generic_make_request_checks() to fail:
> 
>    if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q))
>            goto not_supported;
> 
> I guess this could be worked around by deferring the request to the
> io_uring work queue to avoid REQ_NOWAIT.  But XFS handles this fine so
> how can io_uring.c detect this case cleanly or is there a bug in ext4?

I actually think it's XFS that's broken here, it's not passing down
the IOCB_NOWAIT -> IOMAP_NOWAIT -> REQ_NOWAIT. This means we lose that
important request bit, and we just block instead of triggering the
not_supported case.

Outside of that, that case needs similar treatment to what I did for
the EAGAIN case here:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=893a1c97205a3ece0cbb3f571a3b972080f3b4c7

It was a big mistake to pass back these values in an async fashion,
and it also means that some accounting in other drivers are broken
as we can get completions without the bio actually being submitted.
This was fixed for just the EAGAIN case here:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=c9b3007feca018d3f7061f5d5a14cb00766ffe9b

but that's still broken for EOPNOTSUPP.

tldr is that we should pass these errors back sync, and it was a
mistake to EVER try and do that through ->bi_end_io(). That behavior
was introduced by:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=03a07c92a9ed9938d828ca7f1d11b8bc63a7bb89

I'll add a patch on top of my for-linus branch that makes us handle
the EOPNOTSUPP case similarly. You are right that in those cases we
should just punt to the async worker internally.

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux