RE: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset

Vijay Mohan Guvva <vmohan@xxxxxxxxxxx> · Mon, 11 Mar 2013 12:32:03 -0600

> -----Original Message-----
> From: linux-scsi-owner@xxxxxxxxxxxxxxx [mailto:linux-scsi-
> owner@xxxxxxxxxxxxxxx] On Behalf Of James Smart
> Sent: Monday, March 11, 2013 11:04 AM
> To: Hannes Reinecke
> Cc: Jeremy Linton; Mike Christie; linux-scsi@xxxxxxxxxxxxxxx; Andrew
> Vasquez; Chad Dupuis; Robert Elliot; Smart, James
> Subject: Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
> 
> 
> On 3/11/2013 1:05 PM, Hannes Reinecke wrote:
> > On 03/07/2013 09:35 PM, Jeremy Linton wrote:
> >> On 3/7/2013 2:20 PM, Mike Christie wrote:
> >>> On 03/07/2013 02:13 PM, Jeremy Linton wrote:
> >>>>     For lpfc, you never get to the code. Or rather when I was
> >>>> testing it, I couldn't find any way to propagate an error beyond
> >>>> the initial
> >>>> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
> >>>>
> >>>>     That call pretty much always returns success indpependent of
> >>>> the remote device because the firmware acks the context clear
> >>>> aborts, resulting in the outstanding iocb count being zero
> >>>> (independent of both the mid layer status and the actual device
> >>>> state).
> >>>>
> >>>
> >>> Your lpfc patch fixes that right?
> >>
> >>     Yes. It allows the device reset to fail if the device doesn't
> >> respond to the task mgmt request, or rejects it, etc.
> >>
> >>     It doesn't unjam the commands that get aborted by the
> >> flush_io_context() call.
> >> Those have to depend on their timeouts. That is another patch...
> >>
> >>
> >
> > It's actually worse than that.
> > lpfc_terminate_rport_io() calls lpfc_sli_abort_iocb(), which has this:
> >
> >
> >          if (lpfc_is_link_up(phba))
> >             abtsiocb->iocb.ulpCommand = CMD_ABORT_XRI_CN;
> >         else
> >             abtsiocb->iocb.ulpCommand = CMD_CLOSE_XRI_CN;
> >
> >         /* Setup callback routine and issue the command. */
> >         abtsiocb->iocb_cmpl = lpfc_sli_abort_fcp_cmpl;
> >         ret_val = lpfc_sli_issue_iocb(phba, pring->ringno,
> >                           abtsiocb, 0);
> >         if (ret_val == IOCB_ERROR) {
> >             lpfc_sli_release_iocbq(phba, abtsiocb);
> >             errcnt++;
> >             continue;
> >         }
> >
> >
> > Ie we're calling into firmware and waiting for an async event telling
> > us that the command has been aborted (ideally).
> > What I would like is some kind of synchronous call here, which would
> > guarantee us that we won't run into use-after-free issues.
> >
> > Also 'lpfc_is_link_up' is clearly deficient here as the link itself
> > most likely is up, it's the I_T Nexus which is not.
> >
> > James, is it safe to use 'CMD_CLOSE_XRI_CN' even when the link is up?
> 
> No, it's not safe.  The ABORT, which sends an ABTS, is mandated so that the
> other end and ourselves maintain proper (unique) exchange id
> state.   CLOSE sends no link traffic - but can only be used if the login
> is broken (e.g. there's a different mechanism that communicated
> termination of exchange states).   I don't believe I can trust the logic
> in the OS about frames laying in wait in the fabric (maybe sent earlier,
> delayed at a switch, delivered after os thinks nexus is gone), so driver needs
> to terminate them properly.
> 
> 
> >
> > Which makes me wonder, how _exactly_ is I_T nexus reset supposed to
> > work? After all, we're trying to tell the target port that we cannot
> > talk to it anymore, right?
> > Which has some hurdles, conceptually ...
> > So from my POV I_T nexus reset can only be implemented on the
> > _initiator_ side, disregarding any target implementation.
> > (which would be pointless anyway).
> >
> > Hmm. Probably have to ask T10 for clarification. Robert, any insights?
> 
> 
> The I_T nexus reset should be a FC transport implicit logout call to the LLDD.
> E.g. this becomes a transport-specific action on what it means to
> break the I_T nexus, which for FC, is to terminate the login.   This
> logout call allows the driver to do all the implicit work to kill exchange
> contexts and allows it to adjust the state of the target in
> it's FC discovery engine.  Question is - should the driver re-login ?
> Typically, this would be driven by a RSCN, which I'm guessing for this
> scenario, would not be occurring. If you knew it would, you could let
> the driver respond to the RSCN and re-login later.   If there's no RSCN,
> then I would assume we put a heartbeat into the transport to retry login (to a
> WWPN/WWNN basis - remembered from the I_T nexus reset) with the LLDD
> - a new interface as well - call it "establish I_T_nexus".
> 
> In lpfc's case - the Logout would allow the driver to take the CLOSE_XRI case,
> giving you the speed/asynchronicity you desire. Reuse of scsi job structures
> still can't occur until the driver returns then via the completion routines (as
> DMA related to them must be cancelled within the card by the ABORT/CLOSE
> commands - even if we know there shouldn't be something to DMA).
> 
> -- james s
> 
> 
> >
> > Cheers,
> >
> > Hannes
> >
> >

Adding BROCADE BFA FC SCSI DRIVER maintainer Anil to the CC.

Thanks,
Vijay

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html