On Wed, 2011-02-02 at 20:47 -0800, Mike Christie wrote: > On 02/02/2011 10:05 PM, Mike Christie wrote: > > On 02/02/2011 09:42 PM, Bhanu Gollapudi wrote: > >>> > >>> Actually you do not have to wait for the scsi eh to run, right. It > >>> looks > >>> like bnx2fc would log out the port, which ends up calling > >>> fc_remote_port_delete and that would cause the fc timed out function > >>> to > >>> return BLK_EH_RESET_TIMER to prevent the scsi eh from running. Is > >>> that > >>> right? That type of eh strategy behavior seems like something you > >>> want > >>> to sync up with libfc or the fc class so all drivers do something > >>> similar. > >> > >> As per FCP-4, if the ABTS times out, we will have to explicitly LOGO the > > > > What section is that in? > > > > Ok read it (12.5.1, right). > > >> target and relogin back. If we rely on 60 sec eh_abort_handler, and if > >> ABTS times out, SCSI error handling will go to LUN RESET, TGT reset > >> path, which is a generic error handling than transport specific error > >> handling. > > > > If that is right, then it seems the other FC drivers are doing it wrong > > then, and you hit that problem if someone sets the scsi cmd timer lower > > than BNX2FC_IO_TIMEOUT. If that is right, that just does not seem right > > to hack around the issue in the driver too. > > So if your reading of 12.5.1 is right then libfc is wrong and it seems > other drivers (if they are not doing some magic in firmware) are wrong too. > > My confidence in my FCP skills are very shaken right now :) I am not > sure I what I was thinking when I read it and reviewed libfc. I think > you need to discuss this out the fcoe list people and James Smart and > Andrew Vasquez. > > I think some of them disagree with the other aborting commands (or maybe > just disagree about some of the details), so that should be discussed too. > > But if you are right then you cannot work around this in a driver > specific way. You need to change libfc and the fc class in a way that > the error strategy is correct. For example from fc_timed_out you could > kick off the abort. I was slightly off on the other comment about libfc > not doing a abort from their internal timeout handler. They do an abort > still, but if that fails they let the scsi eh run eventually. I thought > they were going to clean that up too when they removed their internal > timer value in the "libfc: use rport timeout values for fcp recovery" patch. James, Robert, Andrew, Can you please shed some light on this? Thanks, Bhanu -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html