Hi Jason, On 05.02.21 18:25, Jason Gunthorpe wrote: > On Fri, Feb 05, 2021 at 08:48:11AM -0800, James Bottomley wrote: >>> Thanks for pointing this out. I'd strongly support Jason's proposal: >>> >>> https://lore.kernel.org/linux-integrity/20201215175624.GG5487@xxxxxxxx/ >>> >>> It's the best long-term way to fix this. >> >> Really, no it's not. It introduces extra mechanism we don't need. > >> To recap the issue: character devices already have an automatic >> mechanism which holds a reference to the struct device while the >> character node is open so the default is to release resources on final >> put of the struct device. > > The refcount on the struct device only keeps the memory alive, it > doesn't say anything about the ops. We still need to lock and check > the ops each and every time they are used. > > The fact cdev goes all the way till fput means we don't need the extra > get/put I suggested to Lino at all. > >> The practical consequence of this model is that if you allocate a chip >> structure with tpm_chip_alloc() you have to release it again by doing a >> put of *both* devices. > > The final put of the devs should be directly after the > cdev_device_del(), not in a devm. This became all confused because the > devs was created during alloc, not register. Having a device that is > initialized but will never be added is weird. > > See sketch below. > >> Stefan noticed the latter, so we got the bogus patch 8979b02aaf1d >> ("tpm: Fix reference count to main device") applied which simply breaks >> the master/slave model by not taking a reference on the master for the >> slave. I'm not sure why I didn't notice the problem with this fix at >> the time, but attention must have been elsewhere. > > Well, this is sort of OK because we never use the devs in TPM1, so we > end up freeing the chip with a positive refcount on the devs, which is > weird but not a functional bug. > > Jason > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c > index ddaeceb7e10910..e07193a0dd4438 100644 > --- a/drivers/char/tpm/tpm-chip.c > +++ b/drivers/char/tpm/tpm-chip.c > @@ -344,7 +344,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev, > chip->dev_num = rc; > > device_initialize(&chip->dev); > - device_initialize(&chip->devs); > > chip->dev.class = tpm_class; > chip->dev.class->shutdown_pre = tpm_class_shutdown; > @@ -352,29 +351,12 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev, > chip->dev.parent = pdev; > chip->dev.groups = chip->groups; > > - chip->devs.parent = pdev; > - chip->devs.class = tpmrm_class; > - chip->devs.release = tpm_devs_release; > - /* get extra reference on main device to hold on > - * behalf of devs. This holds the chip structure > - * while cdevs is in use. The corresponding put > - * is in the tpm_devs_release (TPM2 only) > - */ > - if (chip->flags & TPM_CHIP_FLAG_TPM2) > - get_device(&chip->dev); > - > if (chip->dev_num == 0) > chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR); > else > chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num); > > - chip->devs.devt = > - MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES); > - > rc = dev_set_name(&chip->dev, "tpm%d", chip->dev_num); > - if (rc) > - goto out; > - rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num); > if (rc) > goto out; > > @@ -382,9 +364,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev, > chip->flags |= TPM_CHIP_FLAG_VIRTUAL; > > cdev_init(&chip->cdev, &tpm_fops); > - cdev_init(&chip->cdevs, &tpmrm_fops); > chip->cdev.owner = THIS_MODULE; > - chip->cdevs.owner = THIS_MODULE; > > rc = tpm2_init_space(&chip->work_space, TPM2_SPACE_BUFFER_SIZE); > if (rc) { > @@ -396,7 +376,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev, > return chip; > > out: > - put_device(&chip->devs); > put_device(&chip->dev); > return ERR_PTR(rc); > } > @@ -445,13 +424,33 @@ static int tpm_add_char_device(struct tpm_chip *chip) > } > > if (chip->flags & TPM_CHIP_FLAG_TPM2) { > + device_initialize(&chip->devs); > + chip->devs.parent = pdev; > + chip->devs.class = tpmrm_class; > + rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num); > + if (rc) > + goto out_put_devs; > + > + /* > + * get extra reference on main device to hold on behalf of devs. > + * This holds the chip structure while cdevs is in use. The > + * corresponding put is in the tpm_devs_release. > + */ > + get_device(&chip->dev); > + chip->devs.release = tpm_devs_release; > + > + chip->devs.devt = > + MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES); > + cdev_init(&chip->cdevs, &tpmrm_fops); > + chip->cdevs.owner = THIS_MODULE; > + > rc = cdev_device_add(&chip->cdevs, &chip->devs); > if (rc) { > dev_err(&chip->devs, > "unable to cdev_device_add() %s, major %d, minor %d, err=%d\n", > dev_name(&chip->devs), MAJOR(chip->devs.devt), > MINOR(chip->devs.devt), rc); > - return rc; > + goto out_put_devs; > } > } > > @@ -460,6 +459,10 @@ static int tpm_add_char_device(struct tpm_chip *chip) > idr_replace(&dev_nums_idr, chip, chip->dev_num); > mutex_unlock(&idr_lock); > > +out_put_devs: > + put_device(&chip->devs); > +out_del_dev: > + cdev_device_del(&chip->cdev); > return rc; > } > > @@ -640,8 +643,10 @@ void tpm_chip_unregister(struct tpm_chip *chip) > if (IS_ENABLED(CONFIG_HW_RANDOM_TPM)) > hwrng_unregister(&chip->hwrng); > tpm_bios_log_teardown(chip); > - if (chip->flags & TPM_CHIP_FLAG_TPM2) > + if (chip->flags & TPM_CHIP_FLAG_TPM2) { > cdev_device_del(&chip->cdevs, &chip->devs); > + put_device(&chip->devs); > + } > tpm_del_char_device(chip); > } > EXPORT_SYMBOL_GPL(tpm_chip_unregister); > I tested the solution you scetched and it fixes the issue for me. Will you send a (real) patch for this? Best regards, Lino