On Sat, Dec 03 2016, Marc Smith wrote: > Finally, I got it! Why is it when I want it to break, it doesn't. =) welcome to my world :-) > > I will say, using the modified mdadm that prevents the synthesized > CHANGE event, it seems to not induce the problem as regularly. > > Below are the kernel logs after stopping an array: Thank you so much for persisting with this. The logs you provide make it clear that two separate processes (494 and 31178) increment the ->active count by opening the device, but never decrement that count by closing the device. It seems too unlikely that either process would be holding the file descriptor open indefinitely, so something must be going wrong either as part of 'open', or as part of 'close'. Now that I know where to look, the bug is obvious. Why didn't I see that before? The open request is failing, almost certainly because MD_CLOSING is set, but the ->active count isn't being decremented on failure. This patch should fix it. Please test and report results. Thanks, NeilBrown Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag" v4.9-rc1) diff --git a/drivers/md/md.c b/drivers/md/md.c index 2089d46b0eb8..a8e07eb2ca5f 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7087,11 +7087,14 @@ static int md_open(struct block_device *bdev, fmode_t mode) } BUG_ON(mddev != bdev->bd_disk->private_data); - if ((err = mutex_lock_interruptible(&mddev->open_mutex))) + if ((err = mutex_lock_interruptible(&mddev->open_mutex))) { + mddev_put(mddev); goto out; + } if (test_bit(MD_CLOSING, &mddev->flags)) { mutex_unlock(&mddev->open_mutex); + mddev_put(mddev); return -ENODEV; }
Attachment:
signature.asc
Description: PGP signature