On 3/14/25 02:29, JiangJianJun wrote:
It's unbearable for systems with large scale scsi devices share HBAs to
block all devices' IOs when handle error commands, we need a new error
handle mechanism to address this issue.
I consulted about this issue a year ago, the discuss link can be found in
refenence. Hannes replied about why we have to block the SCSI host
then perform error recovery kindly. I think it's unnecessary to block
SCSI host for all drivers and can try a small level recovery(LUN based for
example) first to avoid block the SCSI host.
Technically, yes.
There are, however, some issues which would need to be addressed if
someone would design a new error handler.
1. The 'LUN Reset' TMF (as it's currently being used) is badly scoped;
it will reset the LUN itself, affecting all ports to that LUN.
So in a multipathed/multiported environment all initiators will be
affected, even if they haven't experienced an error.
Is that what we want?
Shouldn't we rather use the 'Reset IT Nexus' TMF here?
And, of course, the 'Target Reset' TMF has been dropped from SAM,
so I really don't see the point in spending time here ...
2. Irrespective of the EH granularity, any error handing requires
that all activity on the level has to be stopped. If you need to
issue a LUN reset, you need to stop I/O for that LUN.
3. The current EH framework is designed around 'struct scsi_cmnd'.
Which means that the command _initiating_ the error handling can
only be returned once the _entire_ error handling (with all
escalations) is finished. And more often than not, the application
is waiting on that command to be completed before the next I/O
is sent. And that really limits the effectiveness of any improved
error handler; the application ultimatively has to wait for a
host reset before it can contine.
But anyway.
We already have a mechanism for asynchronous command aborts;
have you checked if you can adapt if for LUN reset, too?
That would be the easiest solution, I guess ...
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich