On 05.02.21 at 16:58, Jason Gunthorpe wrote: eference in the first place). > > No, they are all chained together because they are all in the same > struct: > > struct tpm_chip { > struct device dev; > struct device devs; > struct cdev cdev; > struct cdev cdevs; > > dev holds the refcount on memory, when it goes 0 the whole thing is > kfreed. > > The rule is dev's refcount can't go to zero while any other refcount > is != 0. > > For instance devs holds a get on dev that is put back only when devs > goes to 0: > > static void tpm_devs_release(struct device *dev) > { > struct tpm_chip *chip = container_of(dev, struct tpm_chip, devs); > > /* release the master device reference */ > put_device(&chip->dev); > } > > Both cdev elements do something similar inside the cdev layer. Well this chaining is exactly what does not work nowadays and what the patch is supposed to fix: currently we dont ever take the extra ref (not even in TPM 2 case, note that TPM_CHIP_FLAG_TMP2 is never set), so - if (chip->flags & TPM_CHIP_FLAG_TPM2) - get_device(&chip->dev); + get_device(&chip->dev); and tpm_devs_release() is never called, since there is nothing that ever puts devs, so + rc = devm_add_action_or_reset(pdev, + (void (*)(void *)) put_device, + &chip->devs); The race with only get_device()/putdevice() in tpm_common_open()/tpm_common_release() is: 1. tpm chip is allocated with dev refcount = 1, devs refcount = 1 2. /dev/tpmrm is opened but before we get the ref to dev in tpm_common() another thread rmmmods the chip driver: 3. the chip is unregistered, dev is put with refcount = 0 and the whole chip struct is freed 3. Now open() proceeds, tries to grab the extra ref chip->dev from a chip that has already been deallocated and the system crashes. As I already wrote, that approach was my first thought, too, but since the result crashed due to the race condition, I chose the approach in patch 1. Regards, Lino > The net result is during any open() the tpm_chip is guarenteed to have > a positive refcount. >