Re: [Open-FCoE] [v2 PATCH 4/5] bnx2fc: Broadcom FCoE Offload driver submission - part 2

"Bhanu Gollapudi" <bprakash@xxxxxxxxxxxx> · Wed, 2 Feb 2011 23:04:54 -0800

On Wed, 2011-02-02 at 20:47 -0800, Mike Christie wrote: 
> On 02/02/2011 10:05 PM, Mike Christie wrote:
> > On 02/02/2011 09:42 PM, Bhanu Gollapudi wrote:
> >>>
> >>> Actually you do not have to wait for the scsi eh to run, right. It
> >>> looks
> >>> like bnx2fc would log out the port, which ends up calling
> >>> fc_remote_port_delete and that would cause the fc timed out function
> >>> to
> >>> return BLK_EH_RESET_TIMER to prevent the scsi eh from running. Is
> >>> that
> >>> right? That type of eh strategy behavior seems like something you
> >>> want
> >>> to sync up with libfc or the fc class so all drivers do something
> >>> similar.
> >>
> >> As per FCP-4, if the ABTS times out, we will have to explicitly LOGO the
> >
> > What section is that in?
> >
> 
> Ok read it (12.5.1, right).
> 
> >> target and relogin back. If we rely on 60 sec eh_abort_handler, and if
> >> ABTS times out, SCSI error handling will go to LUN RESET, TGT reset
> >> path, which is a generic error handling than transport specific error
> >> handling.
> >
> > If that is right, then it seems the other FC drivers are doing it wrong
> > then, and you hit that problem if someone sets the scsi cmd timer lower
> > than BNX2FC_IO_TIMEOUT. If that is right, that just does not seem right
> > to hack around the issue in the driver too.
> 
> So if your reading of 12.5.1 is right then libfc is wrong and it seems 
> other drivers (if they are not doing some magic in firmware) are wrong too.
> 
> My confidence in my FCP skills are very shaken right now :) I am not 
> sure I what I was thinking when I read it and reviewed libfc. I think 
> you need to discuss this out the fcoe list people and James Smart and 
> Andrew Vasquez.
> 
> I think some of them disagree with the other aborting commands (or maybe 
> just disagree about some of the details), so that should be discussed too.
> 
> But if you are right then you cannot work around this in a driver 
> specific way. You need to change libfc and the fc class in a way that 
> the error strategy is correct. For example from fc_timed_out you could 
> kick off the abort. I was slightly off on the other comment about libfc 
> not doing a abort from their internal timeout handler. They do an abort 
> still, but if that fails they let the scsi eh run eventually. I thought 
> they were going to clean that up too when they removed their internal 
> timer value in the "libfc: use rport timeout values for fcp recovery" patch.

James, Robert, Andrew,

Can you please shed some light on this?  

Thanks,
Bhanu

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html