Re: dm-mq and end_clone_request()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 28 2016 at 11:23am -0400,
Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote:

> On 07/28/2016 06:33 AM, Mike Snitzer wrote:
> >On Wed, Jul 27 2016 at  7:05pm -0400,
> >Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote:
> >>Thanks again for having made this patch available. I will test it as
> >>soon as I have the time. BTW, in the meantime I ran a few tests with
> >>DM_MQ_DEFAULT=n since until now I ran all tests with
> >>DM_MQ_DEFAULT=y. The result of these tests is as follows:
> >>* v4.6.0, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=y: first simulated
> >>path removal triggers I/O errors.
> >>* v4.6.4, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=n: test passes more
> >>than 100 iterations.
> >
> >I think this may point to an SRP issue then.  Is the synthetic "cable
> >pull" (by writing to /sys/class/srp_remote_ports/port-*/delete)
> >representitive of what actually happens if a cable is physically pulled?
> >
> >Or is your synthetic method hitting the device way harder than would
> >happen with an actual production fault?
> >
> >Again, there hasn't been any report of failures (EIO or otherwise) with
> >extensive scsi-mq and dm-mq testing on a larger FC testbed.
> 
> Hello Mike,
> 
> Sorry but I disagree that the ib_srp driver would be causing the EIO
> errors because:
> * All tests, including the tests that pass, were run with
>   CONFIG_SCSI_MQ_DEFAULT=y in the kernel config. The same code paths
>   were triggered in the ib_srp driver by all the tests
>   (CONFIG_DM_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n).
> * In my previous e-mails I have shown that the EIO error code is
>   generated by the dm-mpath driver after all (SRP) paths have gone. So
>   how could the ib_srp driver be involved?
> 
> There is an important difference between the SCSI FC drivers and
> ib_srp: after dev_loss_tmo expires FC drivers call
> scsi_remove_target() while the SRP transport layer triggers a call
> of scsi_remove_host().
> 
> Both writing into /sys/class/srp_remote_ports/*/delete and pulling a
> cable make the ib_srp driver call scsi_remove_host(). The only
> difference is the timing. With the former method it is more likely
> that the time between submitting I/O and calling scsi_remove_host()
> is small.

Reality is I just need a testbed to reproduce.  This back and forth
isn't really helping us converge on _why_ must_push_back() is returning
false for your case.  I need to know what exactly is causing that method
to return false in your case.

As is, hard to see why blk-mq vs .request_fn interface for DM mpath
device would cause must_push_back() to return false vs true.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel



[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux