Re: MD Remnants After --stop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 4, 2016 at 7:41 PM, NeilBrown <neilb@xxxxxxxx> wrote:
> On Sat, Dec 03 2016, Marc Smith wrote:
>
>> Finally, I got it! Why is it when I want it to break, it doesn't. =)
>
> welcome to my world :-)
>
>
>>
>> I will say, using the modified mdadm that prevents the synthesized
>> CHANGE event, it seems to not induce the problem as regularly.
>>
>> Below are the kernel logs after stopping an array:
>
> Thank you so much for persisting with this.
> The logs you provide make it clear that two separate processes (494 and
> 31178) increment the ->active count by opening the device, but never
> decrement that count by closing the device.
> It seems too unlikely that either process would be holding the
> file descriptor open indefinitely, so something must be going wrong
> either as part of 'open', or as part of 'close'.
>
> Now that I know where to look, the bug is obvious.  Why didn't I see
> that before?
>
> The open request is failing, almost certainly because MD_CLOSING is set,
> but the ->active count isn't being decremented on failure.
> This patch should fix it.
>
> Please test and report results.
>
> Thanks,
> NeilBrown
>
> Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag" v4.9-rc1)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 2089d46b0eb8..a8e07eb2ca5f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7087,11 +7087,14 @@ static int md_open(struct block_device *bdev, fmode_t mode)
>         }
>         BUG_ON(mddev != bdev->bd_disk->private_data);
>
> -       if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
> +       if ((err = mutex_lock_interruptible(&mddev->open_mutex))) {
> +               mddev_put(mddev);
>                 goto out;
> +       }
>
>         if (test_bit(MD_CLOSING, &mddev->flags)) {
>                 mutex_unlock(&mddev->open_mutex);
> +               mddev_put(mddev);
>                 return -ENODEV;
>         }
>

That did the trick! I ran 'mdadm --stop' eight different times on two
nodes, and every time it was removed completely from /dev and
/sys/block just like expected. =)

Thanks for your effort and time on this. I think I read in the other
thread RE: this patch, it may make it into 4.9?


--Marc
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux