How does md gurantee not miss to free an active stripe_head when md stops?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I'm using v3.10 md code for develop. Recently I encounter a problem where an
read IO usually returned from physical disk after md has been stopped.
I reviewed the code and find when md stops, it unregister raid5d and call
shrink_stripes() to free only the *inactive* stripes.
Why is it sure that there is no active stripes linking in handle_list?
I know before stop, it uses O_EXCL open the md, but that won't stop others
open it and send IO to it.

In my case, an OOPS usually happens like below:
I keep calling mdadm -stop to stop a md, but lsof shows it's opened by
systemd-udevd, so "still is inuse".
30s later, udev  reports timeout and be kicked with SIGKILL.
systemd-udevd: worker [19335]/devices/virtual/block/md41 timeout; kill it.

Then md stop process is able to continue and go passed the free_conf(). But
there is an active_stripe left.

kernel:
shrink_stripes:conf(ffff880004affc00)->md(ffff8802d95b4000,md41)active_strip
es=1     <== this is my debug print.
kernel: md41: detected capacity change from 3409128980480 to 0
mdadm: stopped /dev/md41

After md is stopped, an read IO from underlying returned and OOPS.

[190830.867371] md: unbind<dm-64>
[190830.876345] md: export_rdev(dm-64)
[190831.201619] BUG: unable to handle kernel 
[190831.202875] paging request at 0000000000002050
[190831.204101] IP: [<ffffffffa089a349>]
raid5_end_read_request+0xf9/0xdc0[raid456]

I found this returned bio is caused by a user read page, which is caused by
a fput to kill_bdev.
PID: 21345  TASK: ffff8803e5a916c0  CPU: 1   COMMAND: "mdadm"
#0 [ffff88016f777b88] __schedule at ffffffff815f513d
#1 [ffff88016f777bf0] io_schedule at ffffffff815f599d
#2 [ffff88016f777c08] sleep_on_page at ffffffff81155f1e
#3 [ffff88016f777c18] __wait_on_bit_lock at ffffffff815f38ab
#4 [ffff88016f777c58] __lock_page at ffffffff81156038
#5 [ffff88016f777cb0] truncate_inode_pages_range at ffffffff8116645e
#6 [ffff88016f777e00] truncate_inode_pages at ffffffff811664b5
#7 [ffff88016f777e10] kill_bdev at ffffffff811ffaef
#8 [ffff88016f777e28] __blkdev_put at ffffffff81201124
#9 [ffff88016f777e68] blkdev_put at ffffffff81201bae
#10 [ffff88016f777e98] blkdev_close at ffffffff81201d55
#11 [ffff88016f777ea8] __fput at ffffffff811c81b9
#12 [ffff88016f777ef0] ____fput at ffffffff811c847e
#13 [ffff88016f777f00] task_work_run at ffffffff81093b37
#14 [ffff88016f777f30] do_notify_resume at ffffffff81013b0c
#15 [ffff88016f777f50] int_signal at ffffffff8160049d

Anyone can answer my question? Thanks.

Vaughan


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux