On Mon, 2019-03-18 at 18:03 +0000, Doug Fraser wrote: > So we have moved beyond the signaling issues on our TPM for now, but > in ramping up performance saturation testing, I am pounding on the > openssl engine with multiple threads of execution, and I am finding > this fault. > > /var/log/messages:Mar 18 16:43:28 C05BCB00C0A000001153 kern.err > kernel: [11840.869864] tpm tpm0: tpm_try_transmit: tpm_send: error -5 > /var/log/messages:Mar 18 16:43:28 C05BCB00C0A000001153 kern.err > kernel: [11840.878969] tpm tpm0: A TPM error (357) occurred flushing > context This sounds a bit serious. I've taken the liberty of cc'ing the linux- integrity group which is the mailing list where kernel based TPM issues get discussed. Error -5 is EIO which still points to a TPM communications problem. > Within the kernel, reflect up through the applications as: > > TPM2_StartAuthSession failed with 2309 > TPM_RC_SESSION_HANDLES - out of session handles - a session must be > flushed before a new session may be created > Failed to get Key Handle in TPM EC key routines > > The underlying tss code is build with: > > CCFLAGS += -DTPM_POSIX \ > -DTPM_INTERFACE_TYPE_DEFAULT="\"dev\"" \ > -DTPM_DEVICE_DEFAULT="\"/dev/tpmrm0\"" \ > $(BLD_SYSROOT) > > So we should be using the tpmrm resource manager within the kernel. The answer should be yes because without it you'll exhaust the TPM resources in a multi-threaded environment. The TPM has severe limits (like 3) on the number of keys which can be active at any given time. What is happening in the tpmrm situation is that you get one resource manager instance for every separate open of /dev/tpmrm0 but also every TPM operation you try results in a resource manager context save and load for ever volatile key handle and session ... essentially it will be more than tripling the TPM transaction load, since the way the openssl engine works, it usually needs the parent key, a session and the actual key you're loading every time you do something. once a resource manager context flush fails we actually get left with whatever handle it was trying to flush stuck in the TPM which will lead to resource exhaustion. > If I run the test code as a single instance, this never occurs > (within the bounds of 64 hours of constant running) > > Is there a practical limit to the openssl engine, underlying tpmrm, > or even the underlying physical block that I am ignoring here? > My view was that as long as you pass through the tpmrm, you might > stall, but the resources would be managed. Right, the resource manager is supposed to make the TPM scalable. There is a hard limit (64 usually) on the number of active sessions you can have even with a resource manager, but I don't think you're hitting that. James > I am going back to dig through tpm-tis, in particular, tpm2-cmd.c and > tpm-interface.c.