On Wed, Jul 27, 2016 at 10:08:28AM -0400, Mike Snitzer wrote: > On Tue, Jul 26 2016 at 6:51pm -0400, > Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote: > > > On 07/25/2016 06:15 PM, Mike Snitzer wrote: > > > Please try this patch to see if it fixes your issue, thanks. > > > > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > > > index 52baf8a..287caa7 100644 > > > --- a/drivers/md/dm-mpath.c > > > +++ b/drivers/md/dm-mpath.c > > > @@ -433,10 +433,17 @@ failed: > > > */ > > > static int must_push_back(struct multipath *m) > > > { > > > - return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) || > > > - ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) != > > > - test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) && > > > - dm_noflush_suspending(m->ti))); > > > + bool r; > > > + unsigned long flags; > > > + > > > + spin_lock_irqsave(&m->lock, flags); > > > + r = (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) || > > > + ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) != > > > + test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) && > > > + dm_noflush_suspending(m->ti))); > > > + spin_unlock_irqrestore(&m->lock, flags); > > > + > > > + return r; > > > } > > > > > > /* > > > > Hello Mike, > > > > Thank you for having made this patch available. Unfortunately even > > with this patch applied I still see fio reporting I/O errors and the > > following text still appears in the system log immediately before the > > I/O errors are reported (due to debug statements I added in the device > > mapper; mpath 254:0 and mpathbe refer to the same dm device): > > > > Jul 26 15:40:37 ion-dev-ib-ini kernel: mpath 254:0: queue_if_no_path 0 -> 1 > > Jul 26 15:40:37 ion-dev-ib-ini kernel: executing DM ioctl DEV_SUSPEND on mpathbe > > Jul 26 15:40:37 ion-dev-ib-ini kernel: mpath 254:0: queue_if_no_path 1 -> 0 > > This is all as expected. Only question I have: is > dm_noflush_suspending() false? -- I assume so given must_push_back() is > returning false. > > I'm struggling to appreciate why must_push_back() is only true if > noflush is used. Regardless of which type, if there are no paths and > queue_if_no_path was configured (implied by current != saved) then we > shouldn't be returning -EIO back up the stack. > > > BTW, I have not yet been able to determine which user space code > > triggers the DEV_SUSPEND ioctl. A condlog() call I had added just > > above dm_simplecmd_flush(DM_DEVICE_SUSPEND, mapname, 0) call in > > multipathd did not produce any output. if you look in drivers/md/dm-ioctl.c at do_resume(), device mapper internally does a suspend when you call resume with a new table loaded. That's when these suspends are happening. In the userspace tools, this happens in the DM_DEVICE_RESUME calls after dm_addmap_reload(), which loads the new table. These all happen in the domap() function. > I need to dig into the multipath-tools userspace code more to be able to > answer. But I've cc'd Ben Marzinski explicitly to get his insight. > > Curious if multipath-tools _always_ use the noflush variant of suspend? > If not then we're setting ourselves up to return -EIO when we shouldn't. There is only one time when we don't use noflush. That's when we resize the table, and that's because we could otherwise have IOs that are past the end of the device. It's been a known issue for a while now that you cannot resize a multipath device with no working paths. Or (to say it in a way that doesn't assume that people are stupid) if you lose all of your paths while resizing a multipath device, IOs will fail. I've never heard of anyone pushing back on that limitation. -Ben > Mike > > -- > dm-devel mailing list > dm-devel@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel