Re: [BUG] md hang at schedule in md_write_start

NeilBrown <neilb@xxxxxxx> · Thu, 12 Sep 2013 08:59:33 +1000

On Wed, 11 Sep 2013 09:40:08 +0200 Jack Wang <xjtuwjp@xxxxxxxxx> wrote:

> On 09/11/2013 01:54 AM, NeilBrown wrote:
> > On Tue, 10 Sep 2013 13:09:05 +0200 Jack Wang <jinpu.wang@xxxxxxxxxxxxxxxx>
> > wrote:
> > 
> >> snip
> >>
> >> Hi Neil,
> >>
> >> I notice you send out pull request for md update, which include fix for
> >> this bug.
> >>
> >> I think we'd better include the fix to stable tree at least from 3.4
> >> above, what do you think?
> > 
> > I don't think it is a situation that is at all like to occur in normal usage,
> > so it doesn't seem justified for -stable.
> > 
> > Do you disagree?  Did you ever experience the deadlock in normal usage or
> > only in artificial situations?
> 
> Yes, we do see this BUG in our production environment, so I think it's
> good to include it in stable tree.
> 

I was hoping you would explain how....

Maybe I'm misunderstanding, but as I see it the deadlock can only occur if
you run "mdadm --stop" while some other process has the block device open
and is writing to it.  That seems like a dumb thing to do and my suggest
would be to not do it.
Is there a good reason why you try to stop the array while it is being
written to.
Would it make sense for the process to open the block device with O_EXCL.
This would encourage exclusive access, and would also prevent the deadlock
from happening.

NeilBrown
Attachment:
signature.asc

Description: PGP signature