On 07/28/2016 06:33 AM, Mike Snitzer wrote:
On Wed, Jul 27 2016 at 7:05pm -0400,
Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote:
Thanks again for having made this patch available. I will test it as
soon as I have the time. BTW, in the meantime I ran a few tests with
DM_MQ_DEFAULT=n since until now I ran all tests with
DM_MQ_DEFAULT=y. The result of these tests is as follows:
* v4.6.0, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=y: first simulated
path removal triggers I/O errors.
* v4.6.4, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=n: test passes more
than 100 iterations.
I think this may point to an SRP issue then. Is the synthetic "cable
pull" (by writing to /sys/class/srp_remote_ports/port-*/delete)
representitive of what actually happens if a cable is physically pulled?
Or is your synthetic method hitting the device way harder than would
happen with an actual production fault?
Again, there hasn't been any report of failures (EIO or otherwise) with
extensive scsi-mq and dm-mq testing on a larger FC testbed.
Hello Mike,
Sorry but I disagree that the ib_srp driver would be causing the EIO
errors because:
* All tests, including the tests that pass, were run with
CONFIG_SCSI_MQ_DEFAULT=y in the kernel config. The same code paths
were triggered in the ib_srp driver by all the tests
(CONFIG_DM_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n).
* In my previous e-mails I have shown that the EIO error code is
generated by the dm-mpath driver after all (SRP) paths have gone. So
how could the ib_srp driver be involved?
There is an important difference between the SCSI FC drivers and ib_srp:
after dev_loss_tmo expires FC drivers call scsi_remove_target() while
the SRP transport layer triggers a call of scsi_remove_host().
Both writing into /sys/class/srp_remote_ports/*/delete and pulling a
cable make the ib_srp driver call scsi_remove_host(). The only
difference is the timing. With the former method it is more likely that
the time between submitting I/O and calling scsi_remove_host() is small.
I have not yet run any tests with kernel v4.5.x because in the test
I ran the ib_srp and ib_srpt drivers are loaded on the same system
and because I need five v4.7 LIO patches to run this test pass but
unfortunately these patches do not apply cleanly on the v4.5.x code
base.
Please let me know if you need more information.
Can the target core be made to use SRP in loopback (local test machine)
mode? The mptest harness currently defaults to using tcmloop. Would be
great if I could somehow exercise the SRP code without needing a
fullblown IB setup.
But if there isn't a way to achieve that test coverage I can
probably/hopefully get access to a subset of a larger IB/SRP testbed.
All InfiniBand HCAs that I have encountered so far support loopback as
long as at least one HCA port is up (either connected to a switch or
connected to another HCA port and opensm is running against one of these
two ports).
The scripts I used to test the ib_srp driver are available at
https://github.com/bvanassche/srp-test.
Thanks,
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html