Hi, Ewan:
On 07/12/2013 09:30 PM, Ewan Milne wrote:
On Fri, 2013-07-12 at 13:54 +0800, Ren Mingxin wrote:
I'm wondering how do you test, with a special hardware or self-made
module?Would you mind pasting your test method() and result?
This was tested in a SAN environment with an EMC Symmetrix and
Brocade FC switches. The error was injected by the following
commands:
portcfg rscnsupr<port> --enable
portdisable<port>
Where<port> is the FC port of the Symmetrix target.
Multipath is used and the test records how long I/O from userspace
takes to complete after the error handling stops and the I/O is
retried on another path.
What happens is that the target never responds to anything the
HBA sends, so commands and TMFs just timeout. The HBA doesn't
see link down (since it is the target port) and doesn't get an
RSCN. When the HBA is finally reset, however, it can't login
to the target port and so further I/O gets an immediate error.
Unfortunately, not all SAN environments will exhibit the failing
behavior -- it appears as if in some cases the HBA detects the
problem regardless of the switch portcfg setting. But this has
been verified to solve the problem of seemingly endless EH
activity in testing at a large customer site.
Thanks in advance for your explanations in detail. I've been able to
reproduce only with this patchset.
Also, to be clear, we tested with the "Limit overall SCSI EH
runtime" patchset but not the "New EH command timeout handler".
I think the changes to issue the abort in the timeout handler
are a good idea, though, because there really is no need to
wait for all activity on the host to cease before issuing the
abort as far as I can see.
Hmm, agree with you. It is much better to issue aborts without
waiting, which can shorten the timeout handling time.
Acked-by: Ewan D. Milne<emilne@xxxxxxxxxx>
Hi, Hannes:
I noticed that the dd time had been reduced from 6m+ to 2m+ when the
'eh_deadline' was set as 30s, but the dd time was 6m+(nearly the same
as default - 'eh_deadline' was 0) when the 'eh_deadline' was set as
10s. I havn't been able to dig further, but I guess there is some
restriction when setting this 'eh_deadline' interface. Maybe should
not less than some timeout, otherwise 'eh_deadline' setting will not
work?
Thanks,
Ren
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html