Re: [PATCH] md: don't create mddev in md_open

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/31/21 2:55 PM, Christoph Hellwig wrote:
-static struct mddev *mddev_find(dev_t unit)
+static struct mddev *mddev_find(dev_t unit, bool create)

This just makes the mess that is mddev_find even worse.  Please take
a look at the patches at the beginning of the

   "move bd_mutex to the gendisk"

series to try to clean this up properly.


Hello Christoph,

Because your patch is related with md issue, I use this mail thread to discuss.
If you and other people think the To & Cc need to extend, please do it.

If I understanding the series patches correctly, the purpose of [path 1/15]
is to remove "return -ERESTARTSYS" path.

currently md_open, all the racing handling code is below part:

```md_open
    if (mddev->gendisk != bdev->bd_disk) {
        /* we are racing with mddev_put which is discarding this
         * bd_disk.
         */
        mddev_put(mddev);
        /* Wait until bdev->bd_disk is definitely gone */
        if (work_pending(&mddev->del_work))
            flush_workqueue(md_misc_wq);
        /* Then retry the open from the top */
        return -ERESTARTSYS;
    }
```

mddev is removed from mddev internal list in mddev_put, this function is
the key to raise discarding mddev job.

let's only focus on "mddev->gendisk != bdev->bd_disk" case. there are 2 paths:
1> in creating path
this path is impossible to trigger, userspace md device (/dev/mdX) only valid
after md_alloc successfully completing. this time mddev->gendisk must equal with
bdev->bd_disk.

2> in freeing path. (this is the Neil's patch really cared)
2.1>
md_open is running before mddev is removed from md internal list.
Neil wanted to wait queue_work to finish clean job. then return -ERESTARTSYS.
And on next turn, md_open will find the mddev is null (but in real world, the
mddev_find will alloc a new one. this is a bug, it's not Neil real thoughts)
and return -ENODEV.
Your [path 01/15] breaking this rule. you will mistakenly call mddev_get to block clean job.
In my opinion, the solution may simply return -EBUSY (instead of -ENODEV) to
fail the open path. (I will show the code later)

2.2>
the Neil's patch has a bug (I had said in 2.1), it's related with below case:
md_open is called after mddev_put removing mddev but before finishing md_free().
this time mddev is not exist in md internal list, but bdev->bd_disk still grab
the mddev pointer. this scenatio can't return -ERESTARTSYS, it will make __blkdev_get
infinitely calling md_open and trigger a soft lockup.
this case can be fixed by calling mddev_find without creating mddev job. it responses
your new [patch 04/15], the do only search job's mddev_find.

At last, the code (based on your [PATCH 01/15]) may looks like:
```
static int md_open(struct block_device *bdev, fmode_t mode)
{
    /* ...  */
    struct mddev *mddev = mddev_find(bdev->bd_dev); //hm: the new, only do searching job
    int err;

    if (!mddev) //hm: this will cover freeing path 2.2
        return -ENODEV;

    if (mddev->gendisk != bdev->bd_disk) { //hm: for freeing path 2.1
        /* we are racing with mddev_put which is discarding this
         * bd_disk.
         */
        mddev_put(mddev);
        /* Wait until bdev->bd_disk is definitely gone */
        if (work_pending(&mddev->del_work))
            flush_workqueue(md_misc_wq);
        return -EBUSY; //hm: fail this path. userspace can try later and get -ENODEV.
    }

    /* hm: below same as [PATCH 01/15]*/
    err = mutex_lock_interruptible(&mddev->open_mutex);
    if (err)
        return err;

    if (test_bit(MD_CLOSING, &mddev->flags)) {
        mutex_unlock(&mddev->open_mutex);
        return -ENODEV;
    }

    mddev_get(mddev);
    atomic_inc(&mddev->openers);
    mutex_unlock(&mddev->open_mutex);

    bdev_check_media_change(bdev);
    return 0;
}
```

Thanks,
heming




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux