Re: [RFC PATCH v3 00/19] scsi: scsi_error: Introduce new error handle mechanism

Bart Van Assche <bvanassche@xxxxxxx> · Fri, 14 Mar 2025 08:55:25 -0700

On 3/14/25 2:01 AM, Hannes Reinecke wrote:
3. The current EH framework is designed around 'struct scsi_cmnd'.
Which means that the command _initiating_ the error handling can
only be returned once the _entire_ error handling (with all
escalations) is finished. And more often than not, the application
is waiting on that command to be completed before the next I/O
is sent. And that really limits the effectiveness of any improved
error handler; the application ultimatively has to wait for a
host reset before it can contine.

But anyway.
We already have a mechanism for asynchronous command aborts;
have you checked if you can adapt if for LUN reset, too?
That would be the easiest solution, I guess ...

Hmm ... does this mean submitting a LUN reset while concurrently new
SCSI commands can be submitted from another thread? I don't think that's
safe.

Additionally, how could a LUN reset help if a SCSI abort doesn't help?
If a SCSI abort doesn't help, it probably means that the host controller
locked up, e.g. due to a firmware bug. How to recover from this without
resetting the host controller?

Thanks,

Bart.