I have previously written about an attempt to convert the IMA code to
look up a TPM chip and use this chip until system shutdown here:
https://sourceforge.net/p/tpmdd/mailman/message/36270737/
I have revived this effort now and seeing another problem that is
related to lock-ordering when IMA does an initial
tpm_chip_find_get(NULL) and holds on to this chip until shutdown. The
lock ordering problem is relative to the lock-ordering when the hwrng is
using the TPM, and probably other subsystems as well (trusted keys). The
issues is as follows:
hwrng runs into tpm_chip_find_get(chip) and locks the idr and then gets
the read lock to the ops: lock(idr) -> rlock(ops)
IMA uses the initially looked up tpm_chip and with that still holds on
to the read lock on the ops. In tpm_pcr_extend it calls
tpm_chip_find_get(chip), which now acquires lock(idr) and we end up
with: rlock(ops) -> lock(idr)
It looks like no subsystem can hold onto a tpm_chip and its ops for more
than one command since it typically will run into
tpm_chip_find_get(chip) again. I am wondering how to solve this problem.
Maybe by not calling tpm_chip_find_get() for an existing chip at all, so
it would be converted to tpm_chip_find_get(void) to accomodate long-term
consumers of a TPM that are assumed to hold onto that read-lock? This
should work since all other subsystems, like trusted keys and IMA,
currently call the TPM functions with a NULL pointer for the tpm_chip.
Is there a better solution?
Another issue with holding on to the tpm_chip's ops read lock is that
any write lock on the ops will block until a subsystem (IMA) has
released the ops read-lock. A solution for this could be for a long-term
TPM consumer to register itself as a consumer of the TPM chip that gets
a notification when the chip is to be removed. A callback would release
the chip and the read lock on the ops when this happens, though may
require locking in that subsystem as well. In practice this wouldn't get
called with the patches I am working on since IMA shuts down before the
tpm chip is unregistered (if that happens during system shutdown). The
Xen driver seems to be the only exception upon resume:
https://elixir.bootlin.com/linux/latest/source/drivers/char/tpm/xen-tpmfront.c#L388
If you have thoughts about this, please let me know.
Stefan