On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, 2016-08-12 at 21:57 -0700, Dan Williams wrote: >> On Fri, Aug 12, 2016 at 5:29 PM, Dan Williams < >> dan.j.williams@xxxxxxxxx> wrote: >> > On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley >> > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: >> > > On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote: >> > > > Before spending effort trying to flush the destruction of old >> > > > bdi >> > > > instances before new ones are registered, is it rather time to >> > > > complete the conversion of sd to only use dynamically allocated >> > > > devt? >> > > >> > > Do we have to go that far? Surely your fix is extensible: the >> > > only >> > > reason it doesn't work for us is that the gendisk holds the >> > > parent >> > > without a reference, so we can free the SCSI device before its >> > > child >> > > gendisk (good job no-one actually uses gendisk->parent after >> > > we've >> > > released it ...). If we fix that it would mean SCSI can't >> > > release the >> > > sdev until after the queue is dead and the bdi namespace >> > > released, so >> > > isn't something like this the easy fix? >> > > >> > > James >> > > >> > > --- >> > > >> > > diff --git a/block/genhd.c b/block/genhd.c >> > > index fcd6d4f..54ae4ae 100644 >> > > --- a/block/genhd.c >> > > +++ b/block/genhd.c >> > > @@ -514,7 +514,7 @@ static void register_disk(struct device >> > > *parent, struct gendisk *disk) >> > > struct hd_struct *part; >> > > int err; >> > > >> > > - ddev->parent = parent; >> > > + ddev->parent = get_device(parent); >> > > >> > > dev_set_name(ddev, "%s", disk->disk_name); >> > > >> > > @@ -1144,6 +1144,7 @@ static void disk_release(struct device >> > > *dev) >> > > hd_free_part(&disk->part0); >> > > if (disk->queue) >> > > blk_put_queue(disk->queue); >> > > + put_device(dev->parent); >> > > kfree(disk); >> > > } >> > > struct class block_class = { >> > >> > Looks ok at first glance to me. >> > >> > We do hold a reference on the parent device, but it gets dropped at >> > device_unregister() time and this moves it out to the final put. > > We do? Where? Yes, register_disk() does "ddev->parent = parent" and then "device_add(ddev)". device_add() takes the parent reference. > >> > However, this does leave static devt block-device-drivers that >> > register a disk without a parent device susceptible to the race... >> > I think those exist given all the drivers still using add_disk() >> > after commit 52c44d93c26f "block: remove ->driverfs_dev". > > It does? The race is the fact that the parent can be removed before > the child meaning if the parent name is re-registered before the child > dies we get a duplicate name in bdi space. No, the race is that the *name* of the parent isn't released until the child is both unregistered and put. The device core is already ensuring that the parent is not released until all descendants have been removed. > >> So I tried the attached and it makes the libnvdimm unit tests start >> crashing. > > Well, the attached is clearly buggy, isn't it? You're trying to do a > get on the parent before the parent is actually set. Ah, yes, thank you. Fixed up v2 attached that passes my tests. > Why don't you > just try the incremental patch I sent instead of trying to rework it? I reworked it because it is the bdi that holds this extra dependency on the disk's parent, not the disk itself. > >> A couple crash logs attached. Not yet sure what assumption >> is getting violated, but how about that conversion of scsi to use >> dynamic devt? ;-) > > It's completely orthogonal. The problem is in hierarchy lifetimes: > switching from static to dynamic allocation won't change that at all. > You don't see this problem in nvme because the parent control device's > lifetime belongs to the controller not the disk. In SCSI the parent is > our representation of the SCSI device whose lifetime is governed at the > SCSI level and effectively represents the disk. > No, it's only the name. We could achieve the same by teaching the block core to manage the "sd_index_ida" instead of the sd driver itself, but the v2-patch attached works and does not introduce that layering violation.
Attachment:
patch-v2
Description: Binary data