Re: [blktests] zbd/012: Test requeuing of zoned writes and queue freezing

Damien Le Moal <dlemoal@xxxxxxxxxx> · Thu, 28 Nov 2024 14:19:23 +0900

On 11/28/24 14:16, Christoph Hellwig wrote:
> On Thu, Nov 28, 2024 at 02:07:58PM +0900, Damien Le Moal wrote:
>> A bad sector that gets remapped when overwritten is probably the most common,
>> and maybe the only one. I need to check again, but I think that for this case,
>> the scsi stack retries the reminder of a torn write so we probably do not even
>> see it in practice, unless the sector/zone is really dead and cannot be
>> recovered. But in that case, no matter what we do, that zone would not be
>> writable anymore.
> 
> Yes, all retryable errors should be handled by the drivers.  NVMe makes
> this very clear with the DNR bit, while SCSI deals with this on a more
> ad-hoc basis by looking at the sense codes.  So by the time a write error
> bubbles up to the file systems I do not expect the device to ever
> recover from it.  Maybe with some kind of dynamic depop in the future
> where we drop just that zone, but otherwise we're very much done.
> 
>> Still trying to see if I can have some sort of synchronization between incoming
>> writes and zone wp update to avoid relying on the user doing a report zones.
>> That would ensure that emulated zone append always work like the real command.
> 
> I think we're much better off leaving that to the submitter, because
> it better have a really good reason to resubmit a write to the zone.
> We'll just need to properly document the assumptions.

Sounds good. What do you think of adding the opportunistic "update zone wp"
whenever we execute a user report zones ? It is very easy to do and should not
slow down significantly report zones itself because we usually have very zone
write plugs and the hash search is fast.

-- 
Damien Le Moal
Western Digital Research