On 11/12/22 7:59 AM, Bodo Stroesser wrote: > Hello Mike, Maurizio, > > Even if we couldn't yet find a method to fix handling of aborted > TMRs in the core or in all fabric drivers, I still think that keeping > the parallel handling of TMRs would be fine. > > Tcmu offers a TMR notification mechanism to make userspace aware > of ABORT or RESET_LUN. So userspace can try to break cmd handling > and thus speed up TMR response. If we serialize TMR handling, then > the notifications are also serialized and thus lose some of their > power. > > But maybe I have a new (?) idea of how to fix handling of aborted > TMRs in fabric drivers: > 1) Modify core to not call target_put_sess_cmd, no matter whether > SCF_ACK_REF is set. > 2) Modify fabric drivers to handle an aborted TMR just like a > normal TMR response. This means, e.g. qla2xxx would send a > normal response for the Abort. This exactly is what happens > when serializing TMRs, because in that case despite of the > RESET_LUN the core always calls queue_tm_rsp callback instead > of aborted_task callback. > > So to initiators we would show the 'old' behavior, while internally > keeping the parallel processing of TMRs. > > If fabric driver maintainers don't like that approach, they can > change their drivers to correctly kill aborted TMRs. > > What do you think? > I'm fine with doing it in parallel. However, the issue is we have real users hitting it now and we have to fix all the drivers because it's a regression. So if your idea is going take a while then we should revert now and then do your idea whenever it's ready.