On 9/26/23 14:57, Wenchao Hao wrote:
On 2023/9/26 1:54, Mike Christie wrote:
On 9/25/23 10:07 AM, Wenchao Hao wrote:
On 2023/9/25 22:55, Christoph Hellwig wrote:
Before we add another new error handling mechanism we need to fix the
old one first. Hannes' work on not passing the scsi_cmnd to the
various
reset handlers hasn't made a lot of progress in the last five years and
we'll need to urgently fix that first before adding even more
complexity.
I observed Hannes's patches posted about one year ago, it has not been
applied yet. I don't know if he is still working on it.
My patches do not depend much on that work, I think the conflict can be
solved fast between two changes.
I think we want to figure out Hannes's patches first.
For a new EH design we will want to be able to do multiple TMFs in
parallel
on the same host/target right?
It's not necessary to do multiple TMFs in parallel, it's ok to make sure
each TMFs do not affect each other.
For example, we have two devices: 0:0:0:0 and 0:0:0:1
Both of them request device reset, they do not happened in parallel, but
would in serial. If 0:0:0:0 is performing device reset in progress, 0:0:0:1
just wait 0:0:0:0 to finish.
Well, not quite. Any higher-order TMFs are serialized by virtue of
SCSI-EH, but command aborts (which also devolve down to TMFs on certain
drivers) do run in parallel, and there we will be requiring multiple TMFs.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman