On Thu, Feb 8, 2024 at 12:44 AM Li Nan <linan666@xxxxxxxxxxxxxxx> wrote: > > > > 在 2024/2/8 14:50, Song Liu 写道: > > On Wed, Feb 7, 2024 at 1:32 AM <linan666@xxxxxxxxxxxxxxx> wrote: > >> > >> From: Li Nan <linan122@xxxxxxxxxx> > >> > >> 'open_mutex' of gendisk is used to protect open/close block devices. But > >> in bd_link_disk_holder(), it is used to protect the creation of symlink > >> between holding disk and slave bdev, which introduces some issues. > >> > >> When bd_link_disk_holder() is called, the driver is usually in the process > >> of initialization/modification and may suspend submitting io. At this > >> time, any io hold 'open_mutex', such as scanning partitions, can cause > >> deadlocks. For example, in raid: > >> > >> T1 T2 > >> bdev_open_by_dev > >> lock open_mutex [1] > >> ... > >> efi_partition > >> ... > >> md_submit_bio > >> md_ioctl mddev_syspend > >> -> suspend all io > >> md_add_new_disk > >> bind_rdev_to_array > >> bd_link_disk_holder > >> try lock open_mutex [2] > >> md_handle_request > >> -> wait mddev_resume > >> > >> T1 scan partition, T2 add a new device to raid. T1 waits for T2 to resume > >> mddev, but T2 waits for open_mutex held by T1. Deadlock occurs. > >> > >> Fix it by introducing a local mutex 'holder_mutex' to replace 'open_mutex'. > > > > Is this to fix [1]? Do we need some Fixes and/or Closes tags? > > > > No. Just use another way to fix [2], and both [2] and this patch can fix > the issue. I am not sure about the root cause of [1] yet. > > [2] https://patchwork.kernel.org/project/linux-raid/list/?series=812045 > > > Could you please add steps to reproduce this issue? > > We need to modify the kernel, add sleep in md_submit_bio() and md_ioctl() > as below, and then: > 1. mdadm -CR /dev/md0 -l1 -n2 /dev/sd[bc] #create a raid > 2. echo 1 > /sys/module/md_mod/parameters/error_inject #enable sleep > 3. 'mdadm --add /dev/md0 /dev/sda' #add a disk to raid > 4. submit ioctl BLKRRPART to raid within 10s. The analysis makes sense. I also hit the issue a couple times without adding extra delays. But I am not sure whether this is the best fix (I didn't find real issues with it either). Maybe we don't need to suspend the array for ADD_NEW_DISK? So that something like the following might just work? Thanks, Song @@ -7573,7 +7577,6 @@ static inline bool md_ioctl_valid(unsigned int cmd) static bool md_ioctl_need_suspend(unsigned int cmd) { switch (cmd) { - case ADD_NEW_DISK: case HOT_ADD_DISK: case HOT_REMOVE_DISK: case SET_BITMAP_FILE: