On 3/31/21 2:55 PM, Christoph Hellwig wrote:
-static struct mddev *mddev_find(dev_t unit)
+static struct mddev *mddev_find(dev_t unit, bool create)
This just makes the mess that is mddev_find even worse. Please take
a look at the patches at the beginning of the
"move bd_mutex to the gendisk"
series to try to clean this up properly.
Hello Christoph,
Because your patch is related with md issue, I use this mail thread to discuss.
If you and other people think the To & Cc need to extend, please do it.
If I understanding the series patches correctly, the purpose of [path 1/15]
is to remove "return -ERESTARTSYS" path.
currently md_open, all the racing handling code is below part:
```md_open
if (mddev->gendisk != bdev->bd_disk) {
/* we are racing with mddev_put which is discarding this
* bd_disk.
*/
mddev_put(mddev);
/* Wait until bdev->bd_disk is definitely gone */
if (work_pending(&mddev->del_work))
flush_workqueue(md_misc_wq);
/* Then retry the open from the top */
return -ERESTARTSYS;
}
```
mddev is removed from mddev internal list in mddev_put, this function is
the key to raise discarding mddev job.
let's only focus on "mddev->gendisk != bdev->bd_disk" case. there are 2 paths:
1> in creating path
this path is impossible to trigger, userspace md device (/dev/mdX) only valid
after md_alloc successfully completing. this time mddev->gendisk must equal with
bdev->bd_disk.
2> in freeing path. (this is the Neil's patch really cared)
2.1>
md_open is running before mddev is removed from md internal list.
Neil wanted to wait queue_work to finish clean job. then return -ERESTARTSYS.
And on next turn, md_open will find the mddev is null (but in real world, the
mddev_find will alloc a new one. this is a bug, it's not Neil real thoughts)
and return -ENODEV.
Your [path 01/15] breaking this rule. you will mistakenly call mddev_get to block clean job.
In my opinion, the solution may simply return -EBUSY (instead of -ENODEV) to
fail the open path. (I will show the code later)
2.2>
the Neil's patch has a bug (I had said in 2.1), it's related with below case:
md_open is called after mddev_put removing mddev but before finishing md_free().
this time mddev is not exist in md internal list, but bdev->bd_disk still grab
the mddev pointer. this scenatio can't return -ERESTARTSYS, it will make __blkdev_get
infinitely calling md_open and trigger a soft lockup.
this case can be fixed by calling mddev_find without creating mddev job. it responses
your new [patch 04/15], the do only search job's mddev_find.
At last, the code (based on your [PATCH 01/15]) may looks like:
```
static int md_open(struct block_device *bdev, fmode_t mode)
{
/* ... */
struct mddev *mddev = mddev_find(bdev->bd_dev); //hm: the new, only do searching job
int err;
if (!mddev) //hm: this will cover freeing path 2.2
return -ENODEV;
if (mddev->gendisk != bdev->bd_disk) { //hm: for freeing path 2.1
/* we are racing with mddev_put which is discarding this
* bd_disk.
*/
mddev_put(mddev);
/* Wait until bdev->bd_disk is definitely gone */
if (work_pending(&mddev->del_work))
flush_workqueue(md_misc_wq);
return -EBUSY; //hm: fail this path. userspace can try later and get -ENODEV.
}
/* hm: below same as [PATCH 01/15]*/
err = mutex_lock_interruptible(&mddev->open_mutex);
if (err)
return err;
if (test_bit(MD_CLOSING, &mddev->flags)) {
mutex_unlock(&mddev->open_mutex);
return -ENODEV;
}
mddev_get(mddev);
atomic_inc(&mddev->openers);
mutex_unlock(&mddev->open_mutex);
bdev_check_media_change(bdev);
return 0;
}
```
Thanks,
heming