On Fri, 2013-07-12 at 13:54 +0800, Ren Mingxin wrote: > Hi, Ewan: > > I'm wondering how do you test, with a special hardware or self-made > module?Would you mind pasting your test method() and result? Hi Rex- This was tested in a SAN environment with an EMC Symmetrix and Brocade FC switches. The error was injected by the following commands: portcfg rscnsupr <port> --enable portdisable <port> Where <port> is the FC port of the Symmetrix target. Multipath is used and the test records how long I/O from userspace takes to complete after the error handling stops and the I/O is retried on another path. What happens is that the target never responds to anything the HBA sends, so commands and TMFs just timeout. The HBA doesn't see link down (since it is the target port) and doesn't get an RSCN. When the HBA is finally reset, however, it can't login to the target port and so further I/O gets an immediate error. Unfortunately, not all SAN environments will exhibit the failing behavior -- it appears as if in some cases the HBA detects the problem regardless of the switch portcfg setting. But this has been verified to solve the problem of seemingly endless EH activity in testing at a large customer site. Also, to be clear, we tested with the "Limit overall SCSI EH runtime" patchset but not the "New EH command timeout handler". I think the changes to issue the abort in the timeout handler are a good idea, though, because there really is no need to wait for all activity on the host to cease before issuing the abort as far as I can see. -Ewan > > Thanks, > Ren > > > > > Acked-by: Ewan D. Milne<emilne@xxxxxxxxxx> > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html