Re: [PATCH] block: fix deadlock between bd_link_disk_holder and partition scan

Song Liu <song@xxxxxxxxxx> · Fri, 16 Feb 2024 11:03:42 -0800

On Thu, Feb 8, 2024 at 4:49 PM Song Liu <song@xxxxxxxxxx> wrote:
>
> On Thu, Feb 8, 2024 at 12:44 AM Li Nan <linan666@xxxxxxxxxxxxxxx> wrote:
> >
> >
> >
> > 在 2024/2/8 14:50, Song Liu 写道:
> > > On Wed, Feb 7, 2024 at 1:32 AM <linan666@xxxxxxxxxxxxxxx> wrote:
> > >>
> > >> From: Li Nan <linan122@xxxxxxxxxx>
> > >>
> > >> 'open_mutex' of gendisk is used to protect open/close block devices. But
> > >> in bd_link_disk_holder(), it is used to protect the creation of symlink
> > >> between holding disk and slave bdev, which introduces some issues.
> > >>
> > >> When bd_link_disk_holder() is called, the driver is usually in the process
> > >> of initialization/modification and may suspend submitting io. At this
> > >> time, any io hold 'open_mutex', such as scanning partitions, can cause
> > >> deadlocks. For example, in raid:
> > >>
> > >> T1                              T2
> > >> bdev_open_by_dev
> > >>   lock open_mutex [1]
> > >>   ...
> > >>    efi_partition
> > >>    ...
> > >>     md_submit_bio
> > >>                                  md_ioctl mddev_syspend
> > >>                                    -> suspend all io
> > >>                                   md_add_new_disk
> > >>                                    bind_rdev_to_array
> > >>                                     bd_link_disk_holder
> > >>                                      try lock open_mutex [2]
> > >>      md_handle_request
> > >>       -> wait mddev_resume
> > >>
> > >> T1 scan partition, T2 add a new device to raid. T1 waits for T2 to resume
> > >> mddev, but T2 waits for open_mutex held by T1. Deadlock occurs.
> > >>
> > >> Fix it by introducing a local mutex 'holder_mutex' to replace 'open_mutex'.
> > >
> > > Is this to fix [1]? Do we need some Fixes and/or Closes tags?
> > >
> >
> > No. Just use another way to fix [2], and both [2] and this patch can fix
> > the issue. I am not sure about the root cause of [1] yet.
> >
> > [2] https://patchwork.kernel.org/project/linux-raid/list/?series=812045
> >
> > > Could you please add steps to reproduce this issue?
> >
> > We need to modify the kernel, add sleep in md_submit_bio() and md_ioctl()
> > as below, and then:
> >    1. mdadm -CR /dev/md0 -l1 -n2 /dev/sd[bc]  #create a raid
> >    2. echo 1 > /sys/module/md_mod/parameters/error_inject  #enable sleep
> >    3. 'mdadm --add /dev/md0 /dev/sda'  #add a disk to raid
> >    4. submit ioctl BLKRRPART to raid within 10s.
>
> The analysis makes sense. I also hit the issue a couple times without adding
> extra delays. But I am not sure whether this is the best fix (I didn't find real
> issues with it either).

To be extra safe and future proof, we can do something like the
following to only
suspend the array for ADD_NEW_DISK on not-running arrays.

This appear to solve the problem reported in

https://bugzilla.kernel.org/show_bug.cgi?id=218459

Thanks,
Song

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9e41a9aaba8b..395911d5f4d6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7570,10 +7570,11 @@ static inline bool md_ioctl_valid(unsigned int cmd)
        }
 }

-static bool md_ioctl_need_suspend(unsigned int cmd)
+static bool md_ioctl_need_suspend(struct mddev *mddev, unsigned int cmd)
 {
        switch (cmd) {
        case ADD_NEW_DISK:
+               return mddev->pers != NULL;
        case HOT_ADD_DISK:
        case HOT_REMOVE_DISK:
        case SET_BITMAP_FILE:
@@ -7625,6 +7626,7 @@ static int md_ioctl(struct block_device *bdev,
blk_mode_t mode,
        void __user *argp = (void __user *)arg;
        struct mddev *mddev = NULL;
        bool did_set_md_closing = false;
+       bool need_suspend;

        if (!md_ioctl_valid(cmd))
                return -ENOTTY;
@@ -7716,8 +7718,11 @@ static int md_ioctl(struct block_device *bdev,
blk_mode_t mode,
        if (!md_is_rdwr(mddev))
                flush_work(&mddev->sync_work);

-       err = md_ioctl_need_suspend(cmd) ? mddev_suspend_and_lock(mddev) :
-                                          mddev_lock(mddev);
+       need_suspend = md_ioctl_need_suspend(mddev, cmd);
+       if (need_suspend)
+               err = mddev_suspend_and_lock(mddev);
+       else
+               err = mddev_lock(mddev);
        if (err) {
                pr_debug("md: ioctl lock interrupted, reason %d, cmd %d\n",
                         err, cmd);
@@ -7846,8 +7851,10 @@ static int md_ioctl(struct block_device *bdev,
blk_mode_t mode,
            err != -EINVAL)
                mddev->hold_active = 0;

-       md_ioctl_need_suspend(cmd) ? mddev_unlock_and_resume(mddev) :
-                                    mddev_unlock(mddev);
+       if (need_suspend)
+               mddev_unlock_and_resume(mddev);
+       else
+               mddev_unlock(mddev);

 out:
        if(did_set_md_closing)