On 9/25/23 10:07 AM, Wenchao Hao wrote: > On 2023/9/25 22:55, Christoph Hellwig wrote: >> Before we add another new error handling mechanism we need to fix the >> old one first. Hannes' work on not passing the scsi_cmnd to the various >> reset handlers hasn't made a lot of progress in the last five years and >> we'll need to urgently fix that first before adding even more >> complexity. >> > I observed Hannes's patches posted about one year ago, it has not been > applied yet. I don't know if he is still working on it. > > My patches do not depend much on that work, I think the conflict can be > solved fast between two changes. I think we want to figure out Hannes's patches first. For a new EH design we will want to be able to do multiple TMFs in parallel on the same host/target right? The problem is that we need to be able to make forward progress in the EH path and not fail just because we can't allocate memory for a TMF related struct. To accomplish this now, drivers will use mempools, preallocate TMF related structs/mem/tags with their scsi_cmnd related structs, preallocate per host/target/device related structs or ignore what I wrote above and just fail. Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd when it's not needed. That seems nice because after that, then for your new EH we can begin to standardize on how to handle preallocation of drivers resources needed to perform TMFs for your new EH. It could be a per device/target/host callout to allow drivers to preallocate, then scsi-ml calls into the drivers with that data. It doesn't have to be exactly like that or anything close. It would be nice for drivers to not have to think about this type of thing and scsi-ml just to handle the resource management for us when there are multiple TMFs in progress.