On 23.02.2024 02:55, Daniel P. Smith wrote:
On 2/20/24 13:42, Alexander Steffen wrote:
On 02.02.2024 04:08, Lino Sanfilippo wrote:
On 01.02.24 23:21, Jarkko Sakkinen wrote:
On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
Commit 933bfc5ad213 introduced the use of a locality counter to
control when a
locality request is allowed to be sent to the TPM. In the commit,
the counter
is indiscriminately decremented. Thus creating a situation for an
integer
underflow of the counter.
What is the sequence of events that leads to this triggering the
underflow? This information should be represent in the commit message.
AFAIU this is:
1. We start with a locality_counter of 0 and then we call
tpm_tis_request_locality()
for the first time, but since a locality is (unexpectedly) already
active
check_locality() and consequently __tpm_tis_request_locality() return
"true".
check_locality() returns true, but __tpm_tis_request_locality() returns
the requested locality. Currently, this is always 0, so the check for
!ret will always correctly indicate success and increment the
locality_count.
But since theoretically a locality != 0 could be requested, the correct
fix would be to check for something like ret >= 0 or ret == l instead of
!ret. Then the counter will also be incremented correctly for localities
!= 0, and no underflow will happen later on. Therefore, explicitly
checking for an underflow is unnecessary and hides the real problem.
My apologies, but I will have to humbly disagree from a fundamental
level here. If a state variable has bounds, then those bounds should be
enforced when the variable is being manipulated.
That's fine, but that is not what your proposed fix does.
tpm_tis_request_locality and tpm_tis_relinquish_locality are meant to be
called in pairs: for every (successful) call to tpm_tis_request_locality
there *must* be a corresponding call to tpm_tis_relinquish_locality
afterwards. Unfortunately, in C there is no language construct to
enforce that (nothing like a Python context manager), so instead
locality_count is used to count the number of successful calls to
tpm_tis_request_locality, so that tpm_tis_relinquish_locality can wait
to actually relinquish the locality until the last expected call has
happened (you can think of that as a Python RLock, to stay with the
Python analogies).
So if locality_count ever gets negative, that is certainly a bug
somewhere. But your proposed fix hides this bug, by allowing
tpm_tis_relinquish_locality to be called more often than
tpm_tis_request_locality. You could have added something like
BUG_ON(priv->locality_count == 0) before decrementing the counter. That
would really enforce the bounds, without hiding the bug, and I would be
fine with that.
Of course, that still leaves the actual bug to be fixed. In this case,
there is no mismatch between the calls to tpm_tis_request_locality and
tpm_tis_relinquish_locality. It is just (as I said before) that the
counting of successful calls in tpm_tis_request_locality is broken for
localities != 0, so that is what you need to fix.
Assuming that every
path leading to the variable manipulation code has ensured proper
manipulation is just that, an assumption. When assumptions fail is how
bugs and vulnerabilities occur.
To your point, does this full address the situation experienced, I would
say it does not. IMHO, the situation is really a combination of both
patch 1 and patch 2, but the request was to split the changes for
individual discussion. We selected this one as being the fixes for two
reasons. First, it blocks the underflow such that when the Secure Launch
series opens Locality 2, it will get incremented at that time and the
internal locality tracking state variables will end up with the correct
values. Thus leading to the relinquish succeeding at kernel shutdown.
Second, it provides a stronger defensive coding practice.
Another reason that this works as a fix is that the TPM specification
requires the registers to be mirrored across all localities, regardless
of the active locality. While all the request/relinquishes for Locality
0 sent by the early code do not succeed, obtaining the values via the
Locality 0 registers are still guaranteed to be correct.
v/r,
dps