Hi AceLan, Thanks for running the experiments. On Fri, Aug 25, 2023 at 9:32 PM AceLan Kao <acelan@xxxxxxxxx> wrote: [...] > > > > Could you please run the follow two experiments? > > > > 1. Confirm 12a6caf273240a triggers this. Specifically: > > git checkout 12a6caf273240a => repros > > git checkout 12a6caf273240a~1 => cannot repro > Yes, I'm pretty sure about this, that's my bisect result and I just > confirmed it again. > I also tried reverting 12a6caf273240a and the issue is gone. The log doesn't match my guess. Specifically: [ 420.068142] systemd-shutdown[1]: Stopping MD /dev/md123 (9:123). [ 420.074718] md_open:md123 openers++ = 1 by systemd-shutdow [ 420.080787] systemd-shutdown[1]: Failed to sync MD block device /dev/md123, ignoring: Input/output error [ 420.090831] md: md123 stopped. [ 420.094465] systemd-shutdown[1]: Stopping MD /dev/md122 (9:122). [ 420.101045] systemd-shutdown[1]: Could not stop MD /dev/md122: Device or resource busy For a successful stop on md123, we reach the pr_info() in md_open(). For a failed stop on md122, the kernel returns -EBUSY before that pr_info() in md_open(). There are some changes in md_open() in the past few release, so I am not quite sure we are looking at the same code. Therefore, could you please help clarify: 1. Which base kernel are you using? >From the log, you are using 6.5-rc7-706a74159504. However, I think we cannot cleanly revert 12a6caf273240a on top of 6.5-rc7-706a74159504. Did you manually fix some issue in the revert? If so, could you please share the revert commit? 2. If you are not using 6.5-rc7-706a74159504 as base kernel, which one are you using? Thanks, Song > > > > > 2. Try with the following change (add debug messages), which hopefully > > shows which command is holding a reference on mddev->openers. > > > > Thanks, > > Song > > > > diff --git i/drivers/md/md.c w/drivers/md/md.c > > index 78be7811a89f..3e9b718b32c1 100644 > > --- i/drivers/md/md.c > > +++ w/drivers/md/md.c > > @@ -7574,11 +7574,15 @@ static int md_ioctl(struct block_device *bdev, > > blk_mode_t mode, > > if (mddev->pers && atomic_read(&mddev->openers) > 1) { > > mutex_unlock(&mddev->open_mutex); > > err = -EBUSY; > > + pr_warn("%s return -EBUSY for %s with > > mddev->openers = %d\n", > > + __func__, mdname(mddev), > > atomic_read(&mddev->openers)); > > goto out; > > } > > if (test_and_set_bit(MD_CLOSING, &mddev->flags)) { > > mutex_unlock(&mddev->open_mutex); > > err = -EBUSY; > > + pr_warn("%s return -EBUSY for %s with > > MD_CLOSING bit set\n", > > + __func__, mdname(mddev)); > > goto out; > > } > > did_set_md_closing = true; > > @@ -7789,6 +7793,8 @@ static int md_open(struct gendisk *disk, blk_mode_t mode) > > goto out_unlock; > > > > atomic_inc(&mddev->openers); > > + pr_info("%s:%s openers++ = %d by %s\n", __func__, mdname(mddev), > > + atomic_read(&mddev->openers), current->comm); > > mutex_unlock(&mddev->open_mutex); > > > > disk_check_media_change(disk); > > @@ -7807,6 +7813,8 @@ static void md_release(struct gendisk *disk) > > > > BUG_ON(!mddev); > > atomic_dec(&mddev->openers); > > + pr_info("%s:%s openers-- = %d by %s\n", __func__, mdname(mddev), > > + atomic_read(&mddev->openers), current->comm); > > mddev_put(mddev); > > } > It's pretty strange that I can't reproduce the issue after applied the patch. > > I tried to figure out which part affect the issue and found when I > comment out the pr_info() In md_release(), the issue could be > reproduced. > > -- > Chia-Lin Kao(AceLan) > http://blog.acelan.idv.tw/ > E-Mail: acelan.kaoATcanonical.com (s/AT/@/)