Re: mdadm --stop goes off and never comes back?

Neil Brown <neilb@xxxxxxx> · Sat, 22 Dec 2007 22:58:57 +1100

On Wednesday December 19, jnelson-linux-raid@xxxxxxxxxxx wrote:
> On 12/19/07, Jon Nelson <jnelson-linux-raid@xxxxxxxxxxx> wrote:
> > On 12/19/07, Neil Brown <neilb@xxxxxxx> wrote:
> > > On Tuesday December 18, jnelson-linux-raid@xxxxxxxxxxx wrote:
> > > >
> > > > I tried to stop the array:
> > > >
> > > > mdadm --stop /dev/md2
> > > >
> > > > and mdadm never came back. It's off in the kernel somewhere. :-(

Looking at your stack traces, you have the "mdadm -S" holding
an md lock and trying to get a sysfs lock as part of tearing down the
array, and 'hald' is trying to read some attribute in
   /sys/block/md....
and is holding the sysfs lock and trying to get the md lock.
A classic AB-BA deadlock.

> 
> NOTE: kernel is stock openSUSE 10.3 kernel, x86_64, 2.6.22.13-0.3-default.
> 

It is fixed in mainline with some substantial changes to sysfs.
I don't imagine they are likely to get back ported to openSUSE, but
you could try logging a bugzilla if you like.

The 'hald' process is interruptible and killing it would release the
deadlock.

I suspect you have to be fairly unlucky to lose the race but it is
obviously quite possible.

I don't think there is anything I can do on the md side to avoid the
bug.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html