Re: [RFC PATCH v2 00/18] scsi: scsi_error: Introduce new error handle mechanism

Mike Christie <michael.christie@xxxxxxxxxx> · Mon, 25 Sep 2023 12:54:48 -0500

On 9/25/23 10:07 AM, Wenchao Hao wrote:
> On 2023/9/25 22:55, Christoph Hellwig wrote:
>> Before we add another new error handling mechanism we need to fix the
>> old one first.  Hannes' work on not passing the scsi_cmnd to the various
>> reset handlers hasn't made a lot of progress in the last five years and
>> we'll need to urgently fix that first before adding even more
>> complexity.
>>
> I observed Hannes's patches posted about one year ago, it has not been
> applied yet. I don't know if he is still working on it.
> 
> My patches do not depend much on that work, I think the conflict can be
> solved fast between two changes.

I think we want to figure out Hannes's patches first.

For a new EH design we will want to be able to do multiple TMFs in parallel
on the same host/target right? 

The problem is that we need to be able to make forward progress in the EH
path and not fail just because we can't allocate memory for a TMF related
struct. To accomplish this now, drivers will use mempools, preallocate TMF
related structs/mem/tags with their scsi_cmnd related structs, preallocate
per host/target/device related structs or ignore what I wrote above and just
fail.

Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd
when it's not needed. That seems nice because after that, then for your new
EH we can begin to standardize on how to handle preallocation of drivers
resources needed to perform TMFs for your new EH. It could be a per
device/target/host callout to allow drivers to preallocate, then scsi-ml calls
into the drivers with that data. It doesn't have to be exactly like that or
anything close. It would be nice for drivers to not have to think about this
type of thing and scsi-ml just to handle the resource management for us when
there are multiple TMFs in progress.