Re: Time to make dynamically allocated devt the default for scsi disks?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Aug 14, 2016 at 10:20 AM, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Sat, 2016-08-13 at 11:27 -0700, Dan Williams wrote:
>> On Sat, Aug 13, 2016 at 10:43 AM, James Bottomley
>> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> > On Sat, 2016-08-13 at 09:29 -0700, Dan Williams wrote:
>> > > On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
>> > > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> > > > It does?  The race is the fact that the parent can be removed
>> > > > before the child meaning if the parent name is re-registered
>> > > > before the child dies we get a duplicate name in bdi space.
>> > >
>> > > No, the race is that the *name* of the parent isn't released
>> > > until the child is both unregistered and put.  The device core is
>> > > already ensuring that the parent is not released until all
>> > > descendants have been removed.
>> >
>> > We're both saying the same thing: the problem is that, with
>> > df08c32ce3be the bdi name lifetime is tied to the lifetime of the
>> > gendisk.  However, the parent of the gendisk currently is only tied
>> > to the visibility lifetime of the gendisk, not the final put
>> > lifetime, so it doesn't see this.
>> >
>> > > >
>> > > > > So I tried the attached and it makes the libnvdimm unit tests
>> > > > > start crashing.
>> > > >
>> > > > Well, the attached is clearly buggy, isn't it?  You're trying
>> > > > to do a get on the parent before the parent is actually set.
>> > >
>> > > Ah, yes, thank you.  Fixed up v2 attached that passes my tests.
>> > >
>> > > > Why don't you just try the incremental patch I sent instead of
>> > > > trying to rework it?
>> > >
>> > > I reworked it because it is the bdi that holds this extra
>> > > dependency on the disk's parent, not the disk itself.
>> >
>> > Philosophically I don't like this approach.  The dependency goes
>> >
>> > bdi->gendisk->parent
>>
>> I'm arguing that there is no bdi->gendisk dependency.
>
> You created one with your bdi->owner field.  Just because you didn't
> call it a parent doesn't mean it wasn't one.  Arguably the whole bdi
> thing is strangely done because gendisk treats it like a class and
> that's how it behaves, it just doesn't have a proper class structure
> (which is why gendisk creates the link that would be done by the class
> infrastructure)
>
>> The dependency is:
>>
>> bdi->devt
>
> devt isn't a device (in the struct device sense).  It exists as
> effectively an embedded component of the bdi.  As far as I can tell
> there's no reason for it to be separately allocated, it could be
> properly embedded as is the normal pattern.
>
>> It just so happens that block-dynamic devt is released in
>> disk_release().
>>
>> > Making the bdi manage the parent lifetime is an unusual pattern.
>> >  Making the parent stay around until the last reference to gendisk
>> > is put is the usual one.
>>
>> What's unusual is the bdi's dependency on the allocated name, not the
>> gendisk itself.
>
> A name is just a resource belonging to an object.  The object it
> belongs to is the bdi and the bdi is parented by the owner field (and a
> hokey link) to the gendisk.
>
>> > > > >   A couple crash logs attached.  Not yet sure what assumption
>> > > > > is getting violated, but how about that conversion of scsi to
>> > > > > use dynamic devt? ;-)
>> > > >
>> > > > It's completely orthogonal.  The problem is in hierarchy
>> > > > lifetimes: switching from static to dynamic allocation won't
>> > > > change that at all.  You don't see this problem in nvme because
>> > > > the parent control device's lifetime belongs to the controller
>> > > > not the disk.  In SCSI the parent is our representation of the
>> > > > SCSI device whose lifetime is governed at the SCSI level and
>> > > > effectively represents the disk.
>> > > >
>> > >
>> > > No, it's only the name.  We could achieve the same by teaching
>> > > the block core to manage the "sd_index_ida" instead of the sd
>> > > driver itself, but the v2-patch attached works and does not
>> > > introduce that layering violation.
>> >
>> > Um, so this patch doesn't fix the problem. It merely makes the
>> > lifetime rules correct so the problem can then be fixed at the scsi
>> > level.
>>
>> You're right that this patch does not fix the problem, I missed that
>> the scsi_disk is not the parent of the gendisk, so this patch does
>> nothing to delay scsi_disk_release.  What I think is the real fix is
>> to make the devt properly reference counted and prevent
>> ida_remove(&sd_index_ida, sdkp->index); from being called until all
>> objects derived from that allocation are done with it.
>
> OK, this is another philosophical difference, I suppose: since bdi is
> already so complex and non-standard, I really don't think adding more
> non standard stuff is a good idea.  The simplest way to fix it is
>
>    1. The two line patch I already sent to make the bdi hold the owner
>       ->parent until release
>    2. Parent the gendisk to scsi_disk->dev.  The name release is already
>       in the correct place, so this is a 3 line patch.
>
> These are established patterns, so they're both understandable to
> anyone who reads the code.  The answer to any other BDI lifetime
> problem is to free the name in the parent release.
>
> James
>
> ---
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d3e852a..222771d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3000,7 +3000,13 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
>         }
>
>         blk_pm_runtime_init(sdp->request_queue, dev);
> -       device_add_disk(dev, gd);
> +       /*
> +        * previously the parent of the gendisk was the scsi device.  It
> +        * was moved to fix lifetime rules, so now we install a symlink
> +        * to the new location of the block class directory
> +        */
> +       device_add_disk(&sdkp->dev, gd);
> +       WARN_ON(sysfs_add_link_to_group(&dev->kobj, "block", &sdkp->dev.kobj, "block"));
>         if (sdkp->capacity)
>                 sd_dif_config_host(sdkp);
>
> @@ -3142,6 +3148,7 @@ static int sd_remove(struct device *dev)
>
>         async_synchronize_full_domain(&scsi_sd_pm_domain);
>         async_synchronize_full_domain(&scsi_sd_probe_domain);
> +       sysfs_remove_link(&dev->kobj, "block");
>         device_del(&sdkp->dev);
>         del_gendisk(sdkp->disk);
>         sd_shutdown(dev);

I like it.  I still think the bdi registration code should be in
charge of taking the extra reference on the disk device's parent to
isolate / make clear why the extra reference is being acquired, but I
won't lose sleep if Jens takes your smaller change.

Thanks James!
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux