Re: tcm_loop and aborted TMRs

michael.christie@xxxxxxxxxx · Sat, 12 Nov 2022 15:46:56 -0600

On 11/12/22 7:59 AM, Bodo Stroesser wrote:
> Hello Mike, Maurizio,
> 
> Even if we couldn't yet find a method to fix handling of aborted
> TMRs in the core or in all fabric drivers, I still think that keeping
> the parallel handling of TMRs would be fine.
> 
> Tcmu offers a TMR notification mechanism to make userspace aware
> of ABORT or RESET_LUN. So userspace can try to break cmd handling
> and thus speed up TMR response. If we serialize TMR handling, then
> the notifications are also serialized and thus lose some of their
> power.
> 
> But maybe I have a new (?) idea of how to fix handling of aborted
> TMRs in fabric drivers:
> 1) Modify core to not call target_put_sess_cmd, no matter whether
>    SCF_ACK_REF is set.
> 2) Modify fabric drivers to handle an aborted TMR just like a
>    normal TMR response. This means, e.g. qla2xxx would send a
>    normal response for the Abort. This exactly is what happens
>    when serializing TMRs, because in that case despite of the
>    RESET_LUN the core always calls queue_tm_rsp callback instead
>    of aborted_task callback.
> 
> So to initiators we would show the 'old' behavior, while internally
> keeping the parallel processing of TMRs.
> 
> If fabric driver maintainers don't like that approach, they can
> change their drivers to correctly kill aborted TMRs.
> 
> What do you think?
> 

I'm fine with doing it in parallel. However, the issue is we have real
users hitting it now and we have to fix all the drivers because it's a
regression. So if your idea is going take a while then we should revert 
now and then do your idea whenever it's ready.