Re: Zoned storage and BLK_STS_RESOURCE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/12/16 20:15, Christoph Hellwig wrote:
> On Mon, Dec 16, 2024 at 11:24:24AM -0800, Bart Van Assche wrote:
>>
>> Hi Damien,
>>
>> If 'qd=1' is changed into 'qd=2' in tests/zbd/012 then this test fails
>> against all kernel versions I tried, including kernel version 6.9. Do
>> you agree that this test should pass?
> 
> That test case is not very well documented and you're not explaining
> how it fails.
> 
> As far as I can tell the test uses fio to write to a SCSI debug device
> using the zbd randwrite mode and the io_uring I/O engine of fio.

Of note about io_uring: if writes are submitted from multiple jobs to multiple
queues, then you will see unaligned write errors, but the same test with libaio
will work just fine. The reason is that io_uring fio engine IO submission only
adds write requests to the io rings, which will then be submitted by the kernel
ring handling later. But at that time, the ordering information is lost and if
the rings are processed in the wrong order, you'll get unaligned errors.

io_uring is thus unsafe for writes to zoned block devices. Trying to do
something about it has been on my to-do list for a while. Been too busy to do
anything yet. The best solution is of course zone append. If the user wants to
use regular writes, then it better tightly control its write IO issuing to be
QD=1 per zone itself as relying on zone write plugging will not be enough.

> We've ever guaranteed ordering of multiple outstanding asynchronous user
> writes on zoned block devices, so from that point of view a "failure" due
> to write pointer violations when changing the test to use QD=2 is
> entirely expected.

Not for libaio since the io_submit() call goes down to submit_bio(). So if the
issuer user application does the right synchronization (which fio does), libaio
is safe as we are guaranteed that the writes are placed in order in the zone
write plugs. As explained above, that is not the case with io_uring though.


-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux