On Thu, Jul 28 2016 at 11:23am -0400, Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote: > On 07/28/2016 06:33 AM, Mike Snitzer wrote: > >On Wed, Jul 27 2016 at 7:05pm -0400, > >Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote: > >>Thanks again for having made this patch available. I will test it as > >>soon as I have the time. BTW, in the meantime I ran a few tests with > >>DM_MQ_DEFAULT=n since until now I ran all tests with > >>DM_MQ_DEFAULT=y. The result of these tests is as follows: > >>* v4.6.0, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=y: first simulated > >>path removal triggers I/O errors. > >>* v4.6.4, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=n: test passes more > >>than 100 iterations. > > > >I think this may point to an SRP issue then. Is the synthetic "cable > >pull" (by writing to /sys/class/srp_remote_ports/port-*/delete) > >representitive of what actually happens if a cable is physically pulled? > > > >Or is your synthetic method hitting the device way harder than would > >happen with an actual production fault? > > > >Again, there hasn't been any report of failures (EIO or otherwise) with > >extensive scsi-mq and dm-mq testing on a larger FC testbed. > > Hello Mike, > > Sorry but I disagree that the ib_srp driver would be causing the EIO > errors because: > * All tests, including the tests that pass, were run with > CONFIG_SCSI_MQ_DEFAULT=y in the kernel config. The same code paths > were triggered in the ib_srp driver by all the tests > (CONFIG_DM_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n). > * In my previous e-mails I have shown that the EIO error code is > generated by the dm-mpath driver after all (SRP) paths have gone. So > how could the ib_srp driver be involved? > > There is an important difference between the SCSI FC drivers and > ib_srp: after dev_loss_tmo expires FC drivers call > scsi_remove_target() while the SRP transport layer triggers a call > of scsi_remove_host(). > > Both writing into /sys/class/srp_remote_ports/*/delete and pulling a > cable make the ib_srp driver call scsi_remove_host(). The only > difference is the timing. With the former method it is more likely > that the time between submitting I/O and calling scsi_remove_host() > is small. Reality is I just need a testbed to reproduce. This back and forth isn't really helping us converge on _why_ must_push_back() is returning false for your case. I need to know what exactly is causing that method to return false in your case. As is, hard to see why blk-mq vs .request_fn interface for DM mpath device would cause must_push_back() to return false vs true. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html