Re: v3.15 dm-mpath regression: cable pull test causes I/O hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/03/2014 03:56 PM, Bart Van Assche wrote:
On 07/03/14 00:02, Mike Snitzer wrote:
On Fri, Jun 27 2014 at  9:33am -0400,
Mike Snitzer <snitzer@xxxxxxxxxx> wrote:

On Fri, Jun 27 2014 at  9:02am -0400,
Bart Van Assche <bvanassche@xxxxxxx> wrote:

Hello,

While running a cable pull simulation test with dm_multipath on top of
the SRP initiator driver I noticed that after a few iterations I/O locks
up instead of dm_multipath processing the path failure properly (see also
below for a call trace). At least kernel versions 3.15 and 3.16-rc2 are
vulnerable. This issue does not occur with kernel 3.14. I have tried to
bisect this but gave up when I noticed that I/O locked up completely with
a kernel built from git commit ID e809917735ebf1b9a56c24e877ce0d320baee2ec
(dm mpath: push back requests instead of queueing). But with the bisect I
have been able to narrow down this issue to one of the patches in "Merge
tag 'dm-3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/
device-mapper/linux-dm". Does anyone have a suggestion how to analyze this
further or how to fix this ?

I still don't have a _known_ fix for your issue but I reviewed commit
e809917735ebf1b9a56c24e877ce0d320baee2ec closer and identified what
looks to be a regression in logic for multipath_busy, it now calls
!pg_ready() instead of directly checking pg_init_in_progress.  I think
this is needed (Hannes, what do you think?):

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 3f6fd9d..561ead6 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -373,7 +373,7 @@ static int __must_push_back(struct multipath *m)
  		 dm_noflush_suspending(m->ti)));
  }

-#define pg_ready(m) (!(m)->queue_io && !(m)->pg_init_required)
+#define pg_ready(m) (!(m)->queue_io && !(m)->pg_init_required && !(m)->pg_init_in_progress)

  /*
   * Map cloned requests

Hello Mike,

Sorry but even with this patch applied and additionally with commit IDs
86d56134f1b6 ("kobject: Make support for uevent_helper optional") and
bcccff93af35 ("kobject: don't block for each kobject_uevent") reverted
my multipath test still hangs after a few iterations. I also reran the
same test with kernel 3.14.3 and it is still running after 30 iterations.

Hmm. Would've been too easy.
Sigh.

Cheers,

Hannes

--
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux