Re: [PATCH 2/5] scsi: Retry unaligned zoned writes

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Wed, 15 Jun 2022 10:09:13 +0900

On 6/15/22 08:56, Bart Van Assche wrote:
> On 6/14/22 16:29, Damien Le Moal wrote:
>> On 6/15/22 02:49, Bart Van Assche wrote:
>>>  From ZBC-2: "The device server terminates with CHECK CONDITION status, with
>>> the sense key set to ILLEGAL REQUEST, and the additional sense code set to
>>> UNALIGNED WRITE COMMAND a write command, other than an entire medium write
>>> same command, that specifies: a) the starting LBA in a sequential write
>>> required zone set to a value that is not equal to the write pointer for that
>>> sequential write required zone; or b) an ending LBA that is not equal to the
>>> last logical block within a physical block (see SBC-5)."
>>>
>>> I am not aware of any other conditions that may trigger the UNALIGNED
>>> WRITE COMMAND response.
>>>
>>> Retry unaligned writes in preparation of removing zone locking.
>>
>> Arg. No. No way. AHCI will totally break with that because most AHCI
>> adapters do not send commands to the drive in the order they are delivered
>> to the LLD. In more details, the order in which tag bit in the AHCI ready
>> register are set does not determine the order of command delivery to the
>> disk. So if zone locking is removed, you constantly get unaligned write
>> errors.
> 
> The performance penalty of zone locking is not acceptable for our use 
> case. Does this mean that zone locking needs to be preserved for AHCI 
> but not for UFS?

I did mention that: if for a UFS device it is OK to not have zone write
locking, then sure, have mq-deadline not use it and eventually even do not
set ELEVATOR_F_ZBD_SEQ_WRITE for the device queue. But AHCI and SAS HBAs
definitely still need it. NVMe too since all it would take to see an
unaligned write is to have the writer context being rescheduled to a
different CPU or multiple contexts simultaneously writing. Also note that
the command requeue path uses a workqueue and that also results in
reordering, potentially with large delays. I seriously doubt that any
reasonable amount of retry will prevent unaligned write errors if there is
a requeue.

Another solution would be to try to hold the zone write lock for a shorter
interval. All we need is to guarantee in order delivery to the device. We
do not care about completion order. So theoretically, all we need, is to
have the LLD unlock the zone after it issues a write to a device. That is
very tricky to do though as that could be very racy. And that is not
always possible too. E.g., for AHCI, the "command delivered to the device"
essentially boils down to "command tag marked as ready in ready register".
But then you need to wait for that bit to be cleared before setting any
other bit for the next write command in sequence (the bit being cleared
means that the drive got the command). And with the current dispatch push
model, that is not easily possible. We would need to go back to legacy
command pull model.

Also note that for ATA & SAS, with recent drives, the performance penalty
of zone write locking is almost nill as long as the drive is running with
write-cache enabled. And even with write-cache disabled, recent drives are
almost as fast as the WCE case.

> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research