Re: [PATCH v2 4/5] scsi: Retry unaligned zoned writes

Damien Le Moal <dlemoal@xxxxxxxxxx> · Tue, 18 Jul 2023 15:47:15 +0900

On 7/11/23 03:01, Bart Van Assche wrote:
> From ZBC-2: "The device server terminates with CHECK CONDITION status, with
> the sense key set to ILLEGAL REQUEST, and the additional sense code set to
> UNALIGNED WRITE COMMAND a write command, other than an entire medium write
> same command, that specifies: a) the starting LBA in a sequential write
> required zone set to a value that is not equal to the write pointer for that
> sequential write required zone; or b) an ending LBA that is not equal to the
> last logical block within a physical block (see SBC-5)."
> 
> I am not aware of any other conditions that may trigger the UNALIGNED
> WRITE COMMAND response.

Trying to write less than 4KB on a 4Kn drive. But that should not ever happen
for kernel issued commands.

> 
> Send commands that failed with an unaligned write error to the SCSI error
> handler. Let the SCSI error handler sort SCSI commands per LBA before
> resubmitting these.
> 
> Increase the number of retries for write commands sent to a sequential
> zone to the maximum number of outstanding commands.

I think I mentioned this before. When we started btrfs work, we did something
similar (but at the IO scheduler level) to try to avoid adding a big lock in
btrfs to serialize (and thus order) writes. What we discovered is that it was
extremely easy to fall into a situation were the maximum number of possible
outstanding request is already issued, but they all are behind a "hole" and
indefinitely delayed because the missing request cannot be issued due to the max
nr request limit being reached. No forward progress and deadlock.

I do not see how your change addresses this problem. The same will happen with
this and I do not have any suggestion how to solve this. For btrfs, we ended up
using cone append emulation for scsi to avoid the big lock and avoid the FS from
having to order writes. That solution guarantees forward progress. Delaying
already issued writes that are not sequential has no such guarantees.

-- 
Damien Le Moal
Western Digital Research