On Thu, Aug 04 2016 at 6:09am -0400, Hannes Reinecke <hare@xxxxxxx> wrote: > On 08/04/2016 11:53 AM, Hannes Reinecke wrote: > > On 08/03/2016 06:55 PM, Bart Van Assche wrote: > >> On 08/02/2016 05:40 PM, Mike Snitzer wrote: > >>> But I asked you to run the v4.7 kernel patches I > >>> pointed to _without_ any of your debug patches. > >> > >> I need several patches to fix bugs that are not related to the device > >> mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs" > >> (https://lkml.org/lkml/2016/8/3/289). > >> > > Hmm. Can you test with this patch? > > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > > index 7790a70..9daed03 100644 > > --- a/drivers/md/dm-mpath.c > > +++ b/drivers/md/dm-mpath.c > > @@ -439,8 +439,7 @@ static int must_push_back(struct multipath *m) > > { > > return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) || > > ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) != > > - test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) && > > - dm_noflush_suspending(m->ti))); > > + test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags))); > > } > > > > /* > > > > Reasoning: > > The original check for dm_noflush_suspending() was for bio-based > > drivers, which needed to queue I/O within the device-mapper core. > > So during suspend this I/O would keep a reference to the device-mapper > > core and the table couldn't be swapped. > > For request-based multipathing, however, the I/O is _never_ held within > > the device-mapper core but rather pushed back to the request queue. > > IE even for pushback the I/O will never hold a reference to the > > device-mapper core, and the tables can be swapped irrespective of the > > 'dm_noflush_suspend()' setting. > > > > Or that's the idea, at least :-) > > > > Yes Mike, I know, it's not going to work with bio-based multipathing. > > But this is just for figuring out where the real issue is. > > > And indeed. > > multipathd is calling DM_SUSPEND _without_ the noflush_suspending flag. > (On the grounds that originally it needed to flush all I/O from the > device-mapper core). > Which will be causing I/O errors if any I/O is executed after > ->presuspend has been called. The only time multipathd doesn't use noflush is on resize. Otherwise I'm pretty sure it _does_ use noflush all the time. But the point is that the map method shouldn't be called while the multipath device is suspended. I already provided fixes for this, staged here: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8 and relative to to 4.7: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7-mpath-fixes With these patches our testing on real SRP hardware testbed (fast DDN backend) doesn't see any IO errors. But I'll revisit must_push_back relative to dm_noflush_suspending(); specifically the new must_push_back_rq() could be made to not check dm_noflush_suspending(). -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel