Re: MD Remnants After --stop

Stephane Thiell <sthiell@xxxxxxxxxxxx> · Fri, 2 Dec 2016 18:18:59 +0000

Hey,

Just wanted to jump in as I reported a very similar problem recently on:

https://github.com/neilbrown/mdadm/issues/29

Although it’s much better with mdadm 3.4 in my case, I still occasionally get remnant md after --stop ...

Please let me know what you think.

Best,

Stephan

> On Dec 1, 2016, at 2:35 PM, NeilBrown <neilb@xxxxxxxx> wrote:
> 
> On Fri, Dec 02 2016, Marc Smith wrote:
> 
>> On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@xxxxxxxx> wrote:
>>> On Mon, Nov 28 2016, Marc Smith wrote:
>>> 
>>>> 
>>>> # find /sys/block/md127/md
>>>> /sys/block/md127/md
>>>> /sys/block/md127/md/reshape_position
>>>> /sys/block/md127/md/layout
>>>> /sys/block/md127/md/raid_disks
>>>> /sys/block/md127/md/bitmap
>>>> /sys/block/md127/md/bitmap/chunksize
>>> 
>>> This tells me that:
>>>        sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
>>> hasn't been run, so mddev_delayed_delete() hasn't run.
>>> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>>> 
>>> Everything else suggests that array has been stopped and cleaned and
>>> should be gone...
>>> 
>>> This seems to suggest that there is an unbalanced mddev_get() without a
>>> matching mddev_put().  I cannot find it though.
>>> 
>>> If I could reproduce it, I would try to see what is happening by:
>>> 
>>> - putting
>>>   printk("mddev->active = %d\n", atomic_read(&mddev->active));
>>> in the top of mddev_put().  That shouldn't be *too* noisy.
>>> 
>>> - putting
>>>   printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>>>        list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>>> 
>>>  in mddev_put() just before those values are tested.
>>> 
>>> - putting
>>>     printk("queue_work\n");
>>>   just before the 'queue_work()' call in mddev_put.
>>> 
>>> - putting
>>>     printk("mddev_delayed_delete\n");
>>>   in mddev_delayed_delete()
>>> 
>>> Then see what gets printed when you stop the array.
>> 
>> I made those modifications to md.c and here is the kernel log when stopping:
>> 
>> --snip--
>> [ 3937.233487] mddev->active = 2
>> [ 3937.233503] mddev->active = 2
>> [ 3937.233509] mddev->active = 2
>> [ 3937.233516] mddev->active = 1
>> [ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
> 
> At this point, mdadm has opened the /dev/md127 device, accessed a few
> attributes via sysfs just to check on the status, and then closed it
> again.
> The array is still active, but we know that no other process has it
> open.
> 
> 
>> [ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
>> [ 3937.241489] md127: detected capacity change from 73340747776 to 0
>> [ 3937.241493] md: md127 stopped.
> 
> Now mdadm has opened the array again and issued the STOP_ARRAY ioctl.
> Still nothing else has the array open.
> 
>> [ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
>> [ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
>> [ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
>> [ 3937.241989] udevd[4991]: seq 3631 running
>> [ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
>> lockspace group...
>> [ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
>> [ 3937.242068] mddev->active = 3
> 
> But somehow the ->active count got up to 3.
> mdadm probably still has it open, but two other things do too.
> If you have "mdadm --monitor" running in the background (which is good)
> it will temporarily increase, then decrease the count.
> udevd opens the device temporarily too.
> So this isn't necessarily a problem.
> 
>> [ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
>> [ 3937.242080] mddev->active = 3
>> [ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
>> /usr/lib/udev/rules.d/69-bcache.rules:16
>> [ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
>> [ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
>> [ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
>> [ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
>> release_lockspace final free
>> [ 3937.242861] md: unbind<dm-1>
>> [ 3937.256606] md: export_rdev(dm-1)
>> [ 3937.256612] md: unbind<dm-0>
>> [ 3937.263601] md: export_rdev(dm-0)
>> [ 3937.263688] mddev->active = 4
>> [ 3937.263751] mddev->active = 3
> 
> But here, the active count only drops down to 2. (it is decremented
> after it is printed).  Assuming there really were no more messages like
> this, there are two active references to the md device, and we don't
> know what they are.
> 
>> 
>> I didn't use my modified mdadm which stops the synthesized CHANGE from
>> occurring, but if needed, I can re-run the test using that.
> 
> It would be good to use the modified mdadm, if only to reduce the
> noise.  It won't change the end result, but might make it easier to see
> what is happening.
> Also please add
>   WARN_ON(1);
> 
> in the start of mddev_get() and mddev_put().
> That will provide a stack trace whenever either of these are called, so
> we can see who takes a references, and who doesn't release it.
> 
> Thanks,
> NeilBrown

��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f