On 8/22/23 23:26, Hannes Reinecke wrote:
On 8/22/23 21:16, Bart Van Assche wrote:
+/*
+ * Comparison function that allows to sort SCSI commands by ULD driver.
+ */
+static int scsi_cmp_uld(void *priv, const struct list_head *_a,
+ const struct list_head *_b)
+{
+ struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
+ struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
+
+ /* See also the comment above the list_sort() definition. */
+ return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
I have to agree with Christoph here.
Comparing LBA numbers at the SCSI level is really the wrong place.
SCSI commands might be anything, and quite some of these commands don't
even have LBA numbers. So trying to order them will be pointless.
The reordering mechanism really has to go into the block layer, with
the driver failing the request and the block layer resubmitting in-order.
Hi Hannes,
Please take another look at patches 04/16 and 05/16. As one can see no
LBA numbers are being compared in the SCSI core - comparing LBA numbers
happens in the sd (SCSI disk) driver. The code that you replied to
compares ULD pointers, a well-defined concept in the SCSI core.
Your request to move the functionality from patches 04/16 and 05/16 into
the block layer would involve the following:
* Report the unaligned write errors (because a write did not happen at the
write pointer) to the block layer (BLK_STS_WP_MISMATCH?).
* Introduce a mechanism in the block layer for postponing error handling
until all outstanding commands have failed. The approach from the SCSI
core (tracking the number of failed and the number of busy commands
and only waking up the error handler after these counters are equal)
would be unacceptable because of the runtime overhead this mechanism
would introduce in the block layer hot path. Additionally, I strongly
doubt that it is possible to introduce any mechanism for postponing
error handling in the block layer without introducing additional
overhead in the hot path.
* Christoph's opinion is that NVMe software should use zone append
(REQ_OP_ZONE_APPEND) instead of regular writes (REQ_OP_WRITE) when
writing to a zoned namespace. So the SCSI subsystem would be the only
user of the new mechanism introduced in the block layer. The reason we
chose REQ_OP_WRITE for zoned UFS devices is because the SCSI standard
does not support a zone append command and introducing a zone append
command in the SCSI standards is not something that can be realized in
time for the first generation of zoned UFS devices.
Because I assume that both Jens and Christoph disagree strongly with your
request: I have no plans to move the code for sorting zoned writes into
the block layer core.
Jens and Christoph, please correct me if I misunderstood something.
Thanks,
Bart.