On 9/1/21 7:38 PM, Christoph Hellwig wrote:
Commit b0140891a8cea3 ("md: Fix race when creating a new md device.") not only moved assigning mddev->gendisk before calling add_disk, which fixes the races described in the commit log, but also added a mddev->open_mutex critical section over add_disk and creation of the md kobj. Adding a kobject after add_disk is racy vs deleting the gendisk right after adding it, but md already prevents against that by holding a mddev->active reference.
Assuming you mean md_open calls mddev_find -> mddev_get -> atomic_inc(&mddev->active), but the path had already existed before b0140891a8c, and md_alloc also called mddev_find at that time, not sure how it prevents the race though I probably missed something. Cc Neil.
On the other hand taking this lock added a lock order reversal with what is not disk->open_mutex (used to be bdev->bd_mutex when the commit was added) for partition devices, which need that lock for the internal open for the partition scan, and a recent commit also takes it for non-partitioned devices, leading to further lockdep splatter. Fixes: b0140891a8ce ("md: Fix race when creating a new md device.") Fixes: d62633873590 ("block: support delayed holder registration")
IIUC, the issue appeared after d6263387359 (which was for dm issue), perhaps stable maintainer should not apply this to any stable kernel if it only includes b0140891a8ce. Thanks, Guoqing