Re: Zoned storage and BLK_STS_RESOURCE

Damien Le Moal <dlemoal@xxxxxxxxxx> · Mon, 16 Dec 2024 12:54:22 -0800

On 2024/12/16 12:42, Bart Van Assche wrote:
> 
> On 12/16/24 12:23 PM, Damien Le Moal wrote:
>> On 2024/12/16 11:24, Bart Van Assche wrote:
>>> If 'qd=1' is changed into 'qd=2' in tests/zbd/012 then this test fails
>>> against all kernel versions I tried, including kernel version 6.9. Do
>>> you agree that this test should pass? If you agree with this, do you
>>> agree that the only solution is to postpone error handling of zoned
>>> writes until all pending zoned writes have completed and only to
>>> resubmit failed writes after all pending writes have completed?
>>
>> Well, of course: if one write fails, the target zone write pointer will not
>> advance as it should have, so all writes for the same zone after the failed one
>> will be unaligned and fail. Is that what you are talking about ?
>>
>> With the fixes applied to rc3, the automatic error recovery in the block layer
>> is gone. So it is up to the user (FS, DM or application) to do the right thing.
> 
> Hi Damien,
> 
> For non-zoned storage the BLK_STS_RESOURCE status is not reported to the
> I/O submitter (filesystem). The BLK_STS_RESOURCE status causes the block
> layer to retry a request. For zoned storage if the block driver reports
> the BLK_STS_RESOURCE status and if QD > 1 then the submitter
> (filesystem) has to retry the I/O. Isn't that inconsistent? Solving this
> inconsistency is one reason why I would like to postpone handling of
> zoned write errors until all pending I/O has either completed or failed.

As I said, if one write does not work, whatever the reason, all other writes
behind it for the same zone will also not work. So yes, handling of errors in
the end needs to be done after all writes come back to the issuer. Nothing new
here. I do not see the issue. And I am not sure where you want to go with this.

> Another reason is that this behavior change is an essential step towards
> supporting write pipelining. If multiple zoned writes are outstanding,
> and the block driver postpones execution of any of these writes (unit
> attention, BLK_STS_RESOURCE, ...) then any zoned writes must only be
> resubmitted after all pending zoned writes have either completed or failed.

Yes. But I am still confused. Where is the problem ?

> 
> Thanks,
> 
> Bart.

-- 
Damien Le Moal
Western Digital Research