On Fri, 13 Jul 2012 13:58:00 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote: > I found some kernel message below: > ============================================================================= > [ 432.213819] BUG raid5-md0 (Not tainted): Objects remaining on kmem_cache_close() > [ 432.213820] > ----------------------------------------------------------------------------- > [ 432.213820] > [ 432.213823] INFO: Slab 0xffffea00029cae00 objects=24 used=1 > fp=0xffff8800a72bb910 flags=0x100000000004080 > [ 432.213825] Pid: 6207, comm: mdadm Not tainted 3.5.0-rc6+ #56 > [ 432.213827] Call Trace: > [ 432.213833] [<ffffffff8111b4a1>] slab_err+0x71/0x80 > [ 432.213836] [<ffffffff810e4998>] ? __free_pages+0x18/0x30 > [ 432.213839] [<ffffffff8111f45a>] ? kmem_cache_destroy+0x16a/0x3a0 > [ 432.213841] [<ffffffff8111f47d>] kmem_cache_destroy+0x18d/0x3a0 > [ 432.213845] [<ffffffffa007ef8d>] free_conf+0x2d/0xf0 [raid456] > [ 432.213848] [<ffffffffa007f8b1>] stop+0x41/0x60 [raid456] This is certainly something to be concerned about... > > By the following steps, it can reappear. > 1:create raid5 > 2:dd if=/dev/zero of=/dev/md0 bs=1M > 3:when exec "mdadm -S /dev/md0", press "CTRL+C" in dd-command screen. Thanks - this worked first time. Thought the 'bug' I got was different. [ 526.720419] BUG: unable to handle kernel paging request at ffff8801363fbe30 [ 526.720877] IP: [<ffffffff8183cb34>] raid5_end_write_request+0x1f4/0x2f0 I think it is the same root cause though. > > In commit 271f5a9b8f8ae0db95de72779d115c9d0b9d3cc5 > Author: NeilBrown <neilb@xxxxxxx> > Remove invalidate_partition call from do_md_stop > > Because the deadlock, Neil remove the flush data.But in this pathch,Neil > only thoungt the filesystem,but not consider the raw block deivces. Please don't try to guess what I am thinking ;-) Raw block device access is in many ways just like filesystem access. /dev/md0 provides access to a special filesystem that provides exactly one file which is 1-1 mapped to the block device. e.g It uses the same page cache as other filesystems. do_md_stop will fail with -EBUSY if any program has the block device open. When the last program closes /dev/md0, __blkdev_put will call sync_blockdev which will flush out any writes. So when do_md_stop starts disassembling the array there should be no writes. But there are. The problem happens if some other program (dd in this case) closes /dev/md0 between the time that mdadm opens it and the time that mdadm issues the STOP_ARRAY ioctl. In this case dd isn't the last program with /dev/md0 open, so the sync_blockdev in __blkdev_put isn't called. So if mdadm stopped the array by effectively doing echo clear > /sys/block/md0/md/array_state (without holding /dev/md0 open) this race could not happen. Either the 'echo' would, or the last close of /dev/md0 would have happened and the block device will have been flushed. So the problem is that the ioctl holds the block device open preventing the last flush. So the fix should be in md_ioctl. in md_ioctl we have a pointer to the bdev, so we don't need to try to find one with bdget_disk. Unfortunately we need to call sync_blockdev *after* taking ->open_mutex otherwise the race still exists, and md_ioctl can't currently take open_mutex. We need to fix md_set_readonly too - it has the same problem. Maybe I'll pass the bdev in to these two functions and get them to call sync_blockdev after taking the lock and checking the count. But it's late today - I'll sort this out tomorrow. Thanks for finding this! NeilBrown > > Using the latest kernel and readding "invalidate_partition" function. > By the patch suggested, i create os in md0(softraid5).But the deadlock > did not appear. > But maybe my test not correct,so i don't using "invalidate_partition" to > flush data. I used sync_block to flush data. > > Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx> > --- > drivers/md/md.c | 6 ++++++ > drivers/md/raid5.c | 1 + > 2 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index a4c219e..5716569 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -5395,10 +5395,16 @@ static int do_md_stop(struct mddev * mddev, int mode, int is_open) > } > > if (mddev->pers) { > + struct block_device *bdev; > if (mddev->ro) > set_disk_ro(disk, 0); > > __md_stop_writes(mddev); > + /*flush data*/ > + bdev = bdget_disk(disk, 0); > + sync_blockdev(bdev); > + bdput(bdev); > + > md_stop(mddev); > mddev->queue->merge_bvec_fn = NULL; > mddev->queue->backing_dev_info.congested_fn = NULL; > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 04348d7..922e26e 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -1682,6 +1682,7 @@ static void shrink_stripes(struct r5conf *conf) > while (drop_one_stripe(conf)) > ; > > + BUG_ON(atomic_read(&conf->active_stripes)); > if (conf->slab_cache) > kmem_cache_destroy(conf->slab_cache); > conf->slab_cache = NULL;
Attachment:
signature.asc
Description: PGP signature