Re: [RFC PATCH v2 00/18] scsi: scsi_error: Introduce new error handle mechanism

Wenchao Hao <haowenchao2@xxxxxxxxxx> · Wed, 27 Sep 2023 17:41:44 +0800

On 2023/9/27 15:59, Hannes Reinecke wrote:
On 9/26/23 14:57, Wenchao Hao wrote:
On 2023/9/26 1:54, Mike Christie wrote:
On 9/25/23 10:07 AM, Wenchao Hao wrote:
On 2023/9/25 22:55, Christoph Hellwig wrote:
Before we add another new error handling mechanism we need to fix the
old one first.  Hannes' work on not passing the scsi_cmnd to the various
reset handlers hasn't made a lot of progress in the last five years and
we'll need to urgently fix that first before adding even more
complexity.

I observed Hannes's patches posted about one year ago, it has not been
applied yet. I don't know if he is still working on it.

My patches do not depend much on that work, I think the conflict can be
solved fast between two changes.

I think we want to figure out Hannes's patches first.

For a new EH design we will want to be able to do multiple TMFs in parallel
on the same host/target right?

It's not necessary to do multiple TMFs in parallel, it's ok to make sure
each TMFs do not affect each other.

For example, we have two devices: 0:0:0:0 and 0:0:0:1

Both of them request device reset, they do not happened in parallel, but
would in serial. If 0:0:0:0 is performing device reset in progress, 0:0:0:1
just wait 0:0:0:0 to finish.

Well, not quite. Any higher-order TMFs are serialized by virtue of SCSI-EH, but command aborts (which also devolve down to TMFs on certain drivers) do run in parallel, and there we will be requiring multiple TMFs.

It's best that multiple  TMFs can run in parallel, again, looking forwarding
to your changes.

Cheers,

Hannes