Hi, On 05.02.21 14:05, Jason Gunthorpe wrote: >> >> Commit fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>") >> already introduced function tpm_devs_release() to release the extra >> reference but did not implement the required put on chip->devs that results >> in the call of this function. > > Seems wonky, the devs is just supposed to be a side thing, nothing > should be using it as a primary reference count for a tpm. > > The bug here is only that tpm_common_open() did not get a kref on the > chip before putting it in priv and linking it to the fd. See the > comment before tpm_try_get_ops() indicating the caller must already > have taken care to ensure the chip is valid. > > This should be all you need to fix the oops: > > diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c > index 1784530b8387bb..1b738dca7fffb5 100644 > --- a/drivers/char/tpm/tpm-dev-common.c > +++ b/drivers/char/tpm/tpm-dev-common.c > @@ -105,6 +105,7 @@ static void tpm_timeout_work(struct work_struct *work) > void tpm_common_open(struct file *file, struct tpm_chip *chip, > struct file_priv *priv, struct tpm_space *space) > { > + get_device(&priv->chip.dev); > priv->chip = chip; > priv->space = space; > priv->response_read = true; This is racy, isnt it? The time between we open the file and we want to grab the reference in common_open() the chip can already be unregistered and freed. As a matter of fact this solution was the first thing that came into my mind, too, until I noticed the possible race condition. I can only guess that this was what James had in mind when he chose to take the extra reference to chip->dev in tpm_chip_alloc() instead of common_open(). >> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c >> index ddaeceb..3ace199 100644 >> +++ b/drivers/char/tpm/tpm-chip.c >> @@ -360,8 +360,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev, >> * while cdevs is in use. The corresponding put >> * is in the tpm_devs_release (TPM2 only) >> */ >> - if (chip->flags & TPM_CHIP_FLAG_TPM2) >> - get_device(&chip->dev); >> + get_device(&chip->dev); >> >> if (chip->dev_num == 0) >> chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR); >> @@ -422,8 +421,21 @@ struct tpm_chip *tpmm_chip_alloc(struct device *pdev, >> rc = devm_add_action_or_reset(pdev, >> (void (*)(void *)) put_device, >> &chip->dev); >> - if (rc) >> + if (rc) { >> + put_device(&chip->devs); >> return ERR_PTR(rc); > > This isn't right read what 'or_reset' does > In case of failure installing the action handler devm_add_action_or_reset() puts chip->dev for us. But we also have put chip->devs since we have retrieved a reference to both chip->dev and chip->devs. Or do I miss something here? > Jason > Regards, Lino