Re: [PATCH v6 3/7] scsi: core: Retry unaligned zoned writes

Bart Van Assche <bvanassche@xxxxxxx> · Tue, 8 Aug 2023 07:20:55 -0700

On 8/7/23 19:24, Martin K. Petersen wrote:
If zoned writes (REQ_OP_WRITE) for a sequential write required zone
have a starting LBA that differs from the write pointer, e.g. because
zoned writes have been reordered, then the storage device will respond
with an UNALIGNED WRITE COMMAND error. Send commands that failed with
an unaligned write error to the SCSI error handler if zone write
locking is disabled. Let the SCSI error handler sort SCSI commands per
LBA before resubmitting these.

If zone write locking is disabled, increase the number of retries for
write commands sent to a sequential zone to the maximum number of
outstanding commands because in the worst case the number of times
reordered zoned writes have to be retried is (number of outstanding
writes per sequential zone) - 1.

I am afraid that I find falling back to rely on the error handler pretty
kludgy. It seems like there would be a more straightforward way ensure
that request ordering is preserved for devices that are known not to
reorder internally.

I probably missed the finer details of what was discussed while I was
away. But why can't we address the specific corner cases that cause the
unexpected reordering at the block layer? Sorting requests in the SCSI
error handler after a reported failure just seems like papering over the
fact that there's a problem elsewhere.

Hi Martin,

An important question is whether it is possible to preserve the write order
all the time. The software layers and hardware components that are involved
in this context are:
* The filesystem.
* The block layer.
* The I/O scheduler if an I/O scheduler is present.
* The SCSI core.
* The SCSI LLD.
* The storage controller (UFSHCI in this case).
* The link between storage controller and storage device.
* The storage device (UFS in this case).

The SCSI protocol allows SCSI devices, including UFS devices, to respond
with a unit attention or the SCSI BUSY status at any time. If multiple write
commands are pending and some of the pending SCSI commands are not executed
because of a unit attention or because of another reason, this causes
command reordering.

The link between UFS controller and UFS device has a low but non-zero BER.
If a SCSI command is lost by this link and has to be resent, this can cause
reordering.

Although I agree that the code in this patch that sorts and resubmits requests
should be triggered infrequently, I don't think that such code can be avoided
entirely. You may have noticed that a significant effort has been undertaken
to eliminate certain causes of command reordering. See also:
* [PATCH v4 0/5] ufs: Do not requeue while ungating the clock
(https://lore.kernel.org/linux-scsi/20230529202640.11883-1-bvanassche@xxxxxxx/).
* [PATCH v6 00/11] mq-deadline: Improve support for zoned block devices
(https://lore.kernel.org/linux-block/20230517174230.897144-1-bvanassche@xxxxxxx/)
* less special casing for flush requests v2
(https://lore.kernel.org/linux-block/20230519044050.107790-1-hch@xxxxxx/)

As you may be aware performance matters for UFS devices and performance of UFS
devices increases gradually over time. It is important that the code added by
this patch is triggered infrequently to achieve good performance so I have an
interest myself in making sure that this code is triggered infrequently in
current and also in future kernels.

Since I think that it is not possible to avoid sorting and resubmitting
requests entirely, I propose to proceed with the approach of this patch
series.

Thanks,

Bart.