Re: [RFC PATCH v2 00/18] scsi: scsi_error: Introduce new error handle mechanism

Mike Christie <michael.christie@xxxxxxxxxx> · Mon, 25 Sep 2023 11:52:26 -0500

On 9/14/23 1:20 AM, Wenchao Hao wrote:
> On 2023/9/1 17:41, Wenchao Hao wrote:
>> It's unbearable for systems with large scale scsi devices share HBAs to
>> block all devices' IOs when handle error commands, we need a new error
>> handle mechanism to address this issue.
>>
>> I consulted about this issue a year ago, the discuss link can be found in
>> refenence. Hannes replied about why we have to block the SCSI host
>> then perform error recovery kindly. I think it's unnecessary to block
>> SCSI host for all drivers and can try a small level recovery(LUN based for
>> example) first to avoid block the SCSI host.
>>
>> The new error handle mechanism introduced in this patchset has been
>> developed and tested with out self developed hardware since one year
>> ago, now we want this mechanism can be used by more drivers.
>>
>> Drivers can decide if using the new error handle mechanism and how to
>> handle error commands when scsi_device are scanned,the new mechanism
>> makes SCSI error handle more flexible.
>>
>> SCSI error recovery strategy after blocking host's IO is mainly
>> following steps:
>>
>> - LUN reset
>> - Target reset
>> - Bus reset
>> - Host reset
>>
> 
> Mike gave some suggestions and I found a bug in fallback logic, I would
> address these and resend in next few days.

Please wait to resend. I'm still reviewing the patches. When I commented
last time I just did a quick look over to get an idea for the design and
what your goals were.