On Mon, May 22, 2023 at 6:30 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > Commit 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device > from an md array via sysfs") delays the deletion of rdev, however, this > introduces a window that rdev can be added again while the deletion is > not done yet, and sysfs will complain about duplicate filename. > > Follow up patches try to fix this problem by flushing workqueue, however, > flush_rdev_wq() is just dead code, the progress in > md_kick_rdev_from_array(): > > 1) list_del_rcu(&rdev->same_set); > 2) synchronize_rcu(); > 3) queue_work(md_rdev_misc_wq, &rdev->del_work); > > So in flush_rdev_wq(), if rdev is found in the list, work_pending() can > never pass, in the meantime, if work is queued, then rdev can never be > found in the list. > > flush_rdev_wq() can be replaced by flush_workqueue() directly, however, > this approach is not good: > - the workqueue is global, this synchronization for all raid disks is > not necessary. > - flush_workqueue can't be called under 'reconfig_mutex', there is still > a small window between flush_workqueue() and mddev_lock() that other > contexts can queue new work, hence the problem is not solved completely. > > sysfs already has apis to support delete itself through writer, and > these apis, specifically sysfs_break/unbreak_active_protection(), is used > to support deleting rdev synchronously. Therefore, the above commit can be > reverted, and sysfs duplicate filename can be avoided. > > A new mdadm regression test is proposed as well([1]). > > [1] https://lore.kernel.org/linux-raid/20230428062845.1975462-1-yukuai1@xxxxxxxxxxxxxxx/ > Fixes: 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device from an md array via sysfs") > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> Thanks for the fix! I made the following changes and applied it to md-next: 1. remove md_rdev->del_work, which is not used any more; 2. change list_empty_safe to list_empty protected by the mutex, as list_empty_safe doesn't seem safe here. Please let me know if either change doesn't make sense. Thanks, Song