On 1/22/25 6:57 AM, Bart Van Assche wrote: > On 1/17/25 2:47 PM, Damien Le Moal wrote: >> Also, patch 14 puts back a lot of the code that was recently removed. Not nice, >> and its commit message is also far too short given the size of the patch. Please >> spend more time explaining what the patches do and how they do it to facilitate >> review. > > Regarding the following part of patch 07/14: > > - disk_zone_wplug_abort(zwplug); > - zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE; > + if (!disk->queue->limits.driver_preserves_write_order) > + zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE; > + zwplug->flags |= BLK_ZONE_WPLUG_ERROR; > > How about setting BLK_ZONE_WPLUG_ERROR only if > disk->queue->limits.driver_preserves_write_order has been set such that > the restored error handling code is only used if the storage controller > preserves the write order? This should be sufficient to preserve the > existing behavior of the blk-zoned.c code for SMR disks. Your patch changes BLK_ZONE_WPLUG_NEED_WP_UPDATE into BLK_ZONE_WPLUG_ERROR for regular (no write order preserving) zoned devices. I do not see why this is needed. The main difference between this default behavior and what you are trying to do is that BLK_ZONE_WPLUG_NEED_WP_UPDATE is to be expected with your changes while it is NOT expected to ever happen for a regular coned device. > The reason I restored the error handling code is that the code in > blk-zoned.c does not support error handling inside block drivers if QD > 1 > and because supporting the SCSI error handler is essential for UFS > devices. What error handling ? The only error handling we need is to tell tell the user that there was an error and track that the user actually does something about it. That tracking is done with the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag. That's it, nothing more. I think that conflating/handling the expected (even if infrequent) write requeue events with zoned UFS devices with actual hard device write errors (e.g. the write failed because the device is toast or you hit a bad sector, or you have a bad cable or whatever other hardware reason for the write to fail) is making a mess of everything. Treating the requeues like hard errors makes things complicated, but also wrong because even for zoned UFS devices, we may still get hard write errors, and treating these in the same way as requeue event is wrong in my opinion. For hard errors, the write must not be retried and the error propagated immediately to the issuer (e.g. the FS). For a requeue event, even though that is signaled initially at the bottom of the stack by a device unit attention error, there is no reason for us to treat these events as failed requests in the zone write plugging code. We need to have a special treatment for these, e.g. a "ZONE_WRITE_NEED_RETRY" request flag (and that flag can bhe set in the scsi mid-layer before calling into the block layer requeue. When the zone write plugging sees this flag, it can: 1) Stop issuing write BIOs since we know they will fail 2) Wait for all requests already issued and in-flight to come back 3) Restoore the zone write plug write pointer position to the lowest sector of all requests that were requeued 4) Re-issue the requeued writes (after somehow sorting them) 5) Re-start issuing BIOs waiting in the write plug And any write that is failed due to a hard error will still set the zone write plug with BLK_ZONE_WPLUG_NEED_WP_UPDATE after aborting all BIOs that are pluggeg in the zone write plug. Now, I think that the biggest difficulty of this work is to make sure that a write that fails with an unaligned write error due to a write requeue event before it can be clearly differentiated from the same failure without the requeue event before it. For the former, we can recover. For the latter, it is a user bug and we must NOT hide it. Can this be done cleanly ? I am not entirely sure about it because I am still not 100% clear about the conditions that result in a zoned UFS device failing a write with unit attention. As far as I can tell, the first write issued when the device is sleeping will be rejected with that error and must be requeued. But others write behind this failed write that are already in-flight will endup failing with an unaligned write error, no ? Or will they be failed with a unit attention too ? This is all unclear to me. -- Damien Le Moal Western Digital Research