Re: [PATCHv3 0/9] New EH command timeout handler

Ren Mingxin <renmx@xxxxxxxxxxxxxx> · Fri, 12 Jul 2013 18:00:57 +0800

Hi, Hannes:

On 07/12/2013 02:09 PM, Hannes Reinecke wrote:
On 07/12/2013 06:14 AM, Ren Mingxin wrote:
On 07/01/2013 10:24 PM, Hannes Reinecke wrote:
With the original SCSI EH I got:
# time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 142.652 s, 118 kB/s

real    2m22.657s
user    0m0.013s
sys    0m0.145s

With this patchset I got:
# time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 52.1579 s, 322 kB/s

real    0m52.163s
user    0m0.012s
sys    0m0.145s

Test was to disable RSCN on the target port, disable the
target port, and then start the 'dd' command as indicated.

Do you mean disabling RSCN/port is enough? I'm afraid I couldn't
reproduce the problem by your steps. Both with and without your
patchset are the same 'dd' result: 27s. Please let me know where I
neglected or mistook:

1) I made a dm-multipath target 'dm-0' whose grouping policy was
    failover;
2) Disable RSCN/port via brocade fc switch:
    SW300:root>  portcfg rscnsupr 15 --enable; portDisable 15
3) Start the 'dd' command:
    # time dd if=/dev/zero of=/dev/dm-0 bs=4k count=4k oflag=direct
    dd: writing `/dev/sde': Input/output error
    1+0 records in
    0+0 records out
    0 bytes (0 B) copied, 27.8588 s, 0.0 kB/s

    real    0m27.860s
    user    0m0.001s
    sys     0m0.000s

You are aware that you have to disable RSCNs on the _target_ port,
right?
Disabling RSCNs on the _initiator_ ports is a well-tested case, and
the one which actually makes sense (and is even implemented in
QLogic switches).
Disabling RSCNs for the _target_ port, OTOH, has a very questionable
nature (hence QLogic switches don't even allow you to do this).

You're right. By disabling RSCNs on target port, I've reproduced this
problem. Thank you so much. But I've encountered the bug I said
before. I'll test again with your new patchset once you send.

Thanks,
Ren

[ .. ]

Another question:

I also tried to produce timeouts by modifying Yasui's module(please
see APPENDIX A):
http://www.spinics.net/lists/linux-scsi/msg35091.html

But I got a bug with your this patchset by follwing steps(there was
not such bug without your patchset):

# grep lpfc_template /proc/kallsyms
ffffffffa00f9240 d lpfc_template    [lpfc]
# multipath -ll
...
mpathb (36000b5d0006a0000006a14e7000c0000) dm-1 FUJITSU,ETERNUS_DX400
size=50G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 2:0:0:1 sdf 8:80  active ready running
`-+- policy='round-robin 0' prio=130 status=enabled
   `- 3:0:0:1 sdh 8:112 active ready running
# insmod scsi_tmo_mod.ko param=0xffffffffa00f9240,2:0:0:1; time dd
if=/dev/zero of=/dev/dm-1 bs=4k count=4k oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 151.194 s, 111 kB/s

real    2m31.195s
user    0m0.004s
sys    0m0.111s

Please see logs in APPENDIX B. Do you think this bug is irrelevant to
your patchset?

Hmm. No, sadly not.

'cancel_work_sync' cannot be called from an interrupt context;
guess I'll need to convert it to delayed work.

Thanks for testing; will be updating the patchset.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html