Re: MD Remnants After --stop

Marc Smith <marc.smith@xxxxxxx> · Thu, 1 Dec 2016 14:40:55 -0500

On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@xxxxxxxx> wrote:
> On Mon, Nov 28 2016, Marc Smith wrote:
>
>>
>> # find /sys/block/md127/md
>> /sys/block/md127/md
>> /sys/block/md127/md/reshape_position
>> /sys/block/md127/md/layout
>> /sys/block/md127/md/raid_disks
>> /sys/block/md127/md/bitmap
>> /sys/block/md127/md/bitmap/chunksize
>
> This tells me that:
>         sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
> hasn't been run, so mddev_delayed_delete() hasn't run.
> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>
> Everything else suggests that array has been stopped and cleaned and
> should be gone...
>
> This seems to suggest that there is an unbalanced mddev_get() without a
> matching mddev_put().  I cannot find it though.
>
> If I could reproduce it, I would try to see what is happening by:
>
> - putting
>    printk("mddev->active = %d\n", atomic_read(&mddev->active));
>  in the top of mddev_put().  That shouldn't be *too* noisy.
>
> - putting
>    printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>         list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>
>   in mddev_put() just before those values are tested.
>
>  - putting
>      printk("queue_work\n");
>    just before the 'queue_work()' call in mddev_put.
>
>  - putting
>      printk("mddev_delayed_delete\n");
>    in mddev_delayed_delete()
>
> Then see what gets printed when you stop the array.

I made those modifications to md.c and here is the kernel log when stopping:

--snip--
[ 3937.233487] mddev->active = 2
[ 3937.233503] mddev->active = 2
[ 3937.233509] mddev->active = 2
[ 3937.233516] mddev->active = 1
[ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
[ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
[ 3937.241489] md127: detected capacity change from 73340747776 to 0
[ 3937.241493] md: md127 stopped.
[ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
[ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
[ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
[ 3937.241989] udevd[4991]: seq 3631 running
[ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
lockspace group...
[ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
[ 3937.242068] mddev->active = 3
[ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
[ 3937.242080] mddev->active = 3
[ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
/usr/lib/udev/rules.d/69-bcache.rules:16
[ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
[ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
[ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
[ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
release_lockspace final free
[ 3937.242861] md: unbind<dm-1>
[ 3937.256606] md: export_rdev(dm-1)
[ 3937.256612] md: unbind<dm-0>
[ 3937.263601] md: export_rdev(dm-0)
[ 3937.263688] mddev->active = 4
[ 3937.263751] mddev->active = 3
--snip--

I didn't use my modified mdadm which stops the synthesized CHANGE from
occurring, but if needed, I can re-run the test using that.

I don't see any of the "queue_work" or "mddev_delayed_delete" messages
anywhere in the kernel logs. Here is how those monitoring lines are
set in md.c:

--snip--
static void mddev_put(struct mddev *mddev)
{
        struct bio_set *bs = NULL;
        printk("mddev->active = %d\n", atomic_read(&mddev->active));
        if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
                return;
        printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
                list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
        if (!mddev->raid_disks && list_empty(&mddev->disks) &&
            mddev->ctime == 0 && !mddev->hold_active) {
                /* Array is not configured at all, and not held active,
                 * so destroy it */
                list_del_init(&mddev->all_mddevs);
                bs = mddev->bio_set;
                mddev->bio_set = NULL;
                if (mddev->gendisk) {
                        /* We did a probe so need to clean up.  Call
                         * queue_work inside the spinlock so that
                         * flush_workqueue() after mddev_find will
                         * succeed in waiting for the work to be done.
                         */
                        INIT_WORK(&mddev->del_work, mddev_delayed_delete);
                        printk("queue_work\n");
                        queue_work(md_misc_wq, &mddev->del_work);
                } else
                        kfree(mddev);
        }
        spin_unlock(&all_mddevs_lock);
        if (bs)
                bioset_free(bs);
}
--snip--

--snip--
static void mddev_delayed_delete(struct work_struct *ws)
{
        struct mddev *mddev = container_of(ws, struct mddev, del_work);
        printk("mddev_delayed_delete\n");
        sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
        kobject_del(&mddev->kobj);
        kobject_put(&mddev->kobj);
}
--snip--

Let me know if the printk() lines weren't placed in the proper spots
and I'll fix and re-run the test. Thanks for your time.

--Marc

>
> NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html