Re: [Open-FCoE] [PATCH 4/4] libfc: Fix fc_fcp_cleanup_each_cmd()

Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> · Thu, 28 May 2015 11:29:54 -0700

On Thu, 2015-05-28 at 08:35 +0200, Bart Van Assche wrote:
> On 05/27/15 20:08, vasu.dev@xxxxxxxxxxxxxxx wrote:
> > On Wed, 2015-05-27 at 15:14 +0200, Bart Van Assche wrote:
> >> On 05/27/15 03:13, vasu.dev@xxxxxxxxxxxxxxx wrote:
> >>> ...but question is why even we added schedule()
> >>> which is now causing this side effect in this commit:-
> >>>
> >>> commit 7030fd626129ec4d616784516a462d317c251d39
> >>> Author: Bart Van Assche<bvanassche@xxxxxxx>
> >>> Date:   Sat Aug 17 20:34:43 2013 +0000
> >>>
> >>> May be we should remove schedule() there which again does ep->ex_lock
> >>> drop and re-grab logic for schedule. Do we really need schedule there
> >>> added by above patch ?
> >>
> >> The layers above libfc need a way to wait until ongoing invocations of
> >> their own response handler have finished. This is why the schedule() call
> >> has been added in fc_seq_set_resp().
> >
> > Bart is that from LIO or SCST or SCSI ?
> 
> Hello Vasu,
> 
> Both FCoE target drivers need this functionality. When an FCoE initiator 
> submits e.g. a LUN RESET while a SCSI WRITE is in progress, an FCoE 
> target driver needs a way to let the WRITE command finish in such a way 
> that data frames associated with the WRITE that arrive after the WRITE 
> command has been aborted do not cause a kernel crash. Both FCoE target 
> drivers use fc_seq_set_resp() to let 'ep->arg' point at the SCSI command 
> associated with an FC exchange. SCSI abort and LUN reset processing 
> happens concurrently with regular command processing in a FCoE SCSI 
> target driver. This means that an fc_seq_set_resp(sp, NULL, NULL) call 
> is needed to avoid that ep->arg becomes a dangling pointer and also that 
> the context from where that call occurs must wait until any ongoing 
> ep->resp calls have finished.
> 

In LUN RESET processing, fc_seq_set_resp()  sets the ep->resp to NULL
while fsp and exch locking and their states are used to serialize flows
against in-flight WRITEs frames. So once LUN RESET is done then
subsequently exch layer block won't call ep->resp handler anymore as
that will be NULL and that is the reason we have ep->resp check for
being NULL or not before invoking its resp handler above exch layer
block to prevent any more call backs after LUN RESET done. 

We wouldn't have bother to set ep->resp to NULL otherwise and then that
requiring additional checking during ep->resp handler calling.

We would have needed schedule() in a HBA design if the current LUN RESET
call flow had some thing to finish during schedule() preemption time but
then that would have required wait on any such thing with a condition
after schedule() is over but currently schedule is just a pause as I
described below since all it does is releasing the CPU and then just
resumes back without waiting for anything to finish after schedule().

I doubt any other target front ends using schedule() to sync across
RESET and its affected cmds in flight for either TCM or SCST for reasons
I mentioned below, if any such usages then that should be using wait
condition to resume from schedule(). 

> > It is too restricting to their underlying any HBA by not letting them
> > call into them while spin lock held or bh disabled, especially here
> > schedule is hardly doing anything useful other than yielding and then
> > resuming back to grab the lock again, so basically its caller call flow
> > was just paused using schedule instead let that flow do any thing useful
> > during schedule() preemption.
> >
> >
> > static void fc_seq_set_resp(struct fc_seq *sp,
> >                              void (*resp)(struct fc_seq *, struct
> > fc_frame *,
> >                                           void *),
> >                              void *arg)
> > {
> >          struct fc_exch *ep = fc_seq_exch(sp);
> >          DEFINE_WAIT(wait);
> >
> >          spin_lock_bh(&ep->ex_lock);
> >          while (ep->resp_active && ep->resp_task != current) {
> >                  prepare_to_wait(&ep->resp_wq, &wait,
> > TASK_UNINTERRUPTIBLE);
> >                  spin_unlock_bh(&ep->ex_lock);
> >
> >                  schedule();
> >
> >                  spin_lock_bh(&ep->ex_lock);
> >          }
> >          finish_wait(&ep->resp_wq, &wait);
> >          ep->resp = resp;
> >          ep->arg = arg;
> >          spin_unlock_bh(&ep->ex_lock);
> > }
> >
> > May be remove the schedule() is the proper fix here, instead I'd have
> > gone with revised patch below but that seems to have issues as well, see
> > details below.
> 
> Would removing schedule() from that function cause it to busy-wait or 
> did I perhaps misunderstand something ?
> 

I can't think or any busy waiting as I described above in detail.

> >> [ ... ]
> >> diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c
> >> index 30f9ef0..4a1fe55 100644
> >> --- a/drivers/scsi/libfc/fc_exch.c
> >> +++ b/drivers/scsi/libfc/fc_exch.c
> >> @@ -709,8 +709,8 @@ static int fc_seq_exch_abort(const struct fc_seq *req_sp,
> >>    * Since fc_exch_done() invokes fc_seq_set_resp() it is guaranteed that that
> >>    * ep->resp() won't be invoked after fc_exch_done() has returned.
> >>    *
> >> - * The response handler itself may invoke fc_exch_done(), which will clear the
> >> - * ep->resp pointer.
> >> + * The response handler itself may invoke fc_exch_done(), which will set
> >> + * FC_EX_DONE.
> >>    *
> >>    * Return value:
> >>    * Returns true if and only if ep->resp has been invoked.
> >> @@ -730,7 +730,8 @@ static bool fc_invoke_resp(struct fc_exch *ep, struct fc_seq *sp,
> >>   	arg = ep->arg;
> >>   	spin_unlock_bh(&ep->ex_lock);
> >>
> >> -	if (resp) {
> >> +	if (resp && (!(ep->state & (FC_EX_RST_CLEANUP | FC_EX_DONE)) ||
> >> +		     IS_ERR(fp))) {
> >>   		resp(sp, fp, arg);
> >
> > This will prevent any clean up during exches reset which are meant to
> > reset upper layers of exch block in libfc stack.
> 
> Can you clarify this further ? Isn't exchange cleanup always done by 
> passing an error pointer to the 'fp' argument ? 
> The above modification 
> still causes the function 'resp' points at to be called if IS_ERR(fp) is 
> true.
> 

Again, as I described above in detail how we had resp set to NULL and
same used here with additional check to prevent any more calling after
once resp is NULL.

> >>   		res = true;
> >>   	}
> >> @@ -939,7 +940,6 @@ static void fc_exch_done(struct fc_seq *sp)
> >>   	rc = fc_exch_done_locked(ep);
> >>   	spin_unlock_bh(&ep->ex_lock);
> >>
> >> -	fc_seq_set_resp(sp, NULL, ep->arg);
> >
> > We had resp set to NULL here to prevent any subsequent call back and
> > that won't happen anymore.
> 
> Are you sure that it is essential to reset 'resp' to prevent subsequent 
> callbacks ? Since fc_exch_done() sets FC_EX_DONE, since all ep->resp 
> calls happen through fc_invoke_resp() and since this patch modifies 
> fc_invoke_resp() such that it tests the FC_EX_DONE flag before invoking 
> ep->resp, isn't that sufficient to prevent that new invocations of 
> ep->resp will occur after fc_exch_done() has returned ?
> 

May be but anycase once  ep->resp invoking is prevented then no need for
schedule() once LUN reset is done whether through resp checking or
exch/fsp states.

Thanks,
Vasu

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html