Re: [Open-FCoE] [PATCH 4/4] libfc: Fix fc_fcp_cleanup_each_cmd()

Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> · Fri, 29 May 2015 14:16:53 -0700

On Fri, 2015-05-29 at 09:03 +0200, Bart Van Assche wrote:
> On 05/28/15 20:29, Vasu Dev wrote:
> > We would have needed schedule() in a HBA design if the current LUN RESET
> > call flow had some thing to finish during schedule() preemption time but
> > then that would have required wait on any such thing with a condition
> > after schedule() is over but currently schedule is just a pause as I
> > described below since all it does is releasing the CPU and then just
> > resumes back without waiting for anything to finish after schedule().
> 
> The above is not correct. The schedule() call in fc_seq_set_resp() waits 
> until fc_invoke_resp() invokes wake_up().
> 

Does it do for all in-flight pending frames of fsp/exch and not yet
handled by fc_invoke_resp() after LUN RESET ? 

It won't since wake_up() will be issued on first frame after LUN RESET
and subsequent frames  yet to hit fc_invoke_resp() will be curbed by
NULL ep->resp handler as I explained in my last response. So I'd rather
have all frames dropped by  ep->resp == NULL after LUN RESET as it was
before scheduled() & ep->resp_wq/ep->resp_active new logic for this
purpose, since  new logic around schedule()  in itself not serving this
purpose completely beside causing scheduling while atomic bug.

> Regarding LUN RESET behavior, I think the following quotes from SAM 5 
> are interesting. From paragraph "5.6 Aborting commands": "After a 
> command is aborted and TASK ABORTED status, if any, is returned, the 
> SCSI target device shall send no further requests or responses for that 
> command." From the paragraph "6.3.3 Logical unit reset": "When 
> responding to a logical unit reset condition, the logical unit shall 
> abort all commands as described in 5.6."
> 
> The only way to guarantee that no further responses are sent after a LUN 
> RESET has completed is to wait until all ongoing ep->resp() calls have 
> finished. 

Correct and this could be either by at least these two options discussed
so far:-

1. Using schedule() along with waiting for *all* in flight pending
frames instead getting out of wait on first pending frame after LUN
RESET.
2. Just use ep->resp == NULL which is set to NULL with LUN RESET to
discard all frames.

I'd go with #2 w/o having risk of atomic sleep bug like this patch is
fixing with #1 due to use fo schedule() beside currently #1 is not
really implemented to curb all frames as I explained above. Looking at
ep->resp_active++ may give impression as if #1 is fully implemented but
fact is that LUN RESET's pending fsp wait will be over on very first
frame.

> I think that is what patch 4/4 posted on May 26 realizes. Or 
> did I perhaps overlook something ?
> 

We are discussion these above two options to fix scheduling while atomic
bug here, your original 4/4 fixes the issue by dropping
fsp->scsi_pkt_lock around schedule(), which is ok for now so may be just
documents notes on need to drop here and possibly document #2 fix w/o
schedule() as TODO.

It would be helpful to me if you could point to any other existing HBA
user of schedule() for such case due to another reason you mentioned
before as:-

"The layers above libfc need a way to wait until ongoing invocations of
their own response handler have finished. This is why the schedule()
call has been added in fc_seq_set_resp(). ", whether that is for
SCSI/TCM/SCST.

Thanks Bart for quite useful discussion and this patch series. This
discussion should be helpful to fix any more issues due to
unlocking/locking considerations around added schedule() for
ep->resp_wq/ep->resp_active.

Thanks
Vasu

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html