On Thu, Feb 8, 2024 at 4:49 PM Song Liu <song@xxxxxxxxxx> wrote: > > On Thu, Feb 8, 2024 at 12:44 AM Li Nan <linan666@xxxxxxxxxxxxxxx> wrote: > > > > > > > > 在 2024/2/8 14:50, Song Liu 写道: > > > On Wed, Feb 7, 2024 at 1:32 AM <linan666@xxxxxxxxxxxxxxx> wrote: > > >> > > >> From: Li Nan <linan122@xxxxxxxxxx> > > >> > > >> 'open_mutex' of gendisk is used to protect open/close block devices. But > > >> in bd_link_disk_holder(), it is used to protect the creation of symlink > > >> between holding disk and slave bdev, which introduces some issues. > > >> > > >> When bd_link_disk_holder() is called, the driver is usually in the process > > >> of initialization/modification and may suspend submitting io. At this > > >> time, any io hold 'open_mutex', such as scanning partitions, can cause > > >> deadlocks. For example, in raid: > > >> > > >> T1 T2 > > >> bdev_open_by_dev > > >> lock open_mutex [1] > > >> ... > > >> efi_partition > > >> ... > > >> md_submit_bio > > >> md_ioctl mddev_syspend > > >> -> suspend all io > > >> md_add_new_disk > > >> bind_rdev_to_array > > >> bd_link_disk_holder > > >> try lock open_mutex [2] > > >> md_handle_request > > >> -> wait mddev_resume > > >> > > >> T1 scan partition, T2 add a new device to raid. T1 waits for T2 to resume > > >> mddev, but T2 waits for open_mutex held by T1. Deadlock occurs. > > >> > > >> Fix it by introducing a local mutex 'holder_mutex' to replace 'open_mutex'. > > > > > > Is this to fix [1]? Do we need some Fixes and/or Closes tags? > > > > > > > No. Just use another way to fix [2], and both [2] and this patch can fix > > the issue. I am not sure about the root cause of [1] yet. > > > > [2] https://patchwork.kernel.org/project/linux-raid/list/?series=812045 > > > > > Could you please add steps to reproduce this issue? > > > > We need to modify the kernel, add sleep in md_submit_bio() and md_ioctl() > > as below, and then: > > 1. mdadm -CR /dev/md0 -l1 -n2 /dev/sd[bc] #create a raid > > 2. echo 1 > /sys/module/md_mod/parameters/error_inject #enable sleep > > 3. 'mdadm --add /dev/md0 /dev/sda' #add a disk to raid > > 4. submit ioctl BLKRRPART to raid within 10s. > > The analysis makes sense. I also hit the issue a couple times without adding > extra delays. But I am not sure whether this is the best fix (I didn't find real > issues with it either). To be extra safe and future proof, we can do something like the following to only suspend the array for ADD_NEW_DISK on not-running arrays. This appear to solve the problem reported in https://bugzilla.kernel.org/show_bug.cgi?id=218459 Thanks, Song diff --git a/drivers/md/md.c b/drivers/md/md.c index 9e41a9aaba8b..395911d5f4d6 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7570,10 +7570,11 @@ static inline bool md_ioctl_valid(unsigned int cmd) } } -static bool md_ioctl_need_suspend(unsigned int cmd) +static bool md_ioctl_need_suspend(struct mddev *mddev, unsigned int cmd) { switch (cmd) { case ADD_NEW_DISK: + return mddev->pers != NULL; case HOT_ADD_DISK: case HOT_REMOVE_DISK: case SET_BITMAP_FILE: @@ -7625,6 +7626,7 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode, void __user *argp = (void __user *)arg; struct mddev *mddev = NULL; bool did_set_md_closing = false; + bool need_suspend; if (!md_ioctl_valid(cmd)) return -ENOTTY; @@ -7716,8 +7718,11 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode, if (!md_is_rdwr(mddev)) flush_work(&mddev->sync_work); - err = md_ioctl_need_suspend(cmd) ? mddev_suspend_and_lock(mddev) : - mddev_lock(mddev); + need_suspend = md_ioctl_need_suspend(mddev, cmd); + if (need_suspend) + err = mddev_suspend_and_lock(mddev); + else + err = mddev_lock(mddev); if (err) { pr_debug("md: ioctl lock interrupted, reason %d, cmd %d\n", err, cmd); @@ -7846,8 +7851,10 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode, err != -EINVAL) mddev->hold_active = 0; - md_ioctl_need_suspend(cmd) ? mddev_unlock_and_resume(mddev) : - mddev_unlock(mddev); + if (need_suspend) + mddev_unlock_and_resume(mddev); + else + mddev_unlock(mddev); out: if(did_set_md_closing)