Re: [PATCH v3 2/2] tpm: in tpm2_del_space check if ops pointer is still valid

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Fri, 05 Feb 2021 09:54:29 -0800

On Fri, 2021-02-05 at 13:25 -0400, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 08:48:11AM -0800, James Bottomley wrote:
> > > Thanks for pointing this out. I'd strongly support Jason's
> > > proposal:
> > > 
> > > https://lore.kernel.org/linux-integrity/20201215175624.GG5487@xxxxxxxx/
> > > 
> > > It's the best long-term way to fix this.
> > 
> > Really, no it's not.  It introduces extra mechanism we don't need.
> > To recap the issue: character devices already have an automatic
> > mechanism which holds a reference to the struct device while the
> > character node is open so the default is to release resources on
> > final
> > put of the struct device.
> 
> The refcount on the struct device only keeps the memory alive, it
> doesn't say anything about the ops. We still need to lock and check
> the ops each and every time they are used.

I think this is the crux of our disagreement: I think the ops doesn't
matter because to call try_get_ops you have to have a chip structure
and the only way you get a chip structure is if you hold a device
containing it, in which case the device hold guarantees the chip can't
be freed.  Or if you pass in TPM_ANY_NUM to an operation which calls 
tpm_chip_find_get() which iterates the idr to find a chip under the idr
lock.  If you find a chip device at the idr, you're guaranteed it
exists, because elimination of it is the first thing the release does
and if you find a dying dev (i.e. the release routine blocks on the idr
mutex trying to kill the chip attachment), try_get_ops() fails because
the ops are already NULL.

In either case, I think you get returned a device to which you hold a
reference.  Is there any other case where you can get a chip without
also getting a device reference?

I'll answer the other point in a separate email, but I think the
principle sounds OK: we could do the final put right after we del the
char devices because that's called in the module release routine and
thus not have to rely on the devm actions which, as you say, are an
annoying complication.

James