Re: [RFC] Persist ima logs to disk

Raphael Gianotti <raphgi@xxxxxxxxxxxxxxxxxxx> · Thu, 7 Jan 2021 14:57:48 -0800

On 1/7/2021 1:48 PM, James Bottomley wrote:
On Thu, 2021-01-07 at 15:51 -0500, Mimi Zohar wrote:
On Thu, 2021-01-07 at 12:37 -0800, James Bottomley wrote:
On Thu, 2021-01-07 at 15:02 -0500, Mimi Zohar wrote:
On Thu, 2021-01-07 at 08:42 -0800, James Bottomley wrote:
[...]
What about having a log entry that's the current PCR
value?  Then stretches of the log starting with these entries
would be independently verifiable provided you had a way of
trusting the PCR value.  It might be possible to get the TPM to
add a signed quote as an optional part of the log entry (of
course this brings other problems like which key do you use for
the signing and how does it get verified) which would provide
the trust and would definitively allow you to archive log
segments and still make the rest of the log useful.
The current PCR values are aggregated and stored in the
boot_aggregate record.  As part of the new boot_aggregate record
format, the individual PCR values could be included.
I don't think we care about the boot aggregate ... it's just the
initial log entry that ties the boot state to the initial runtime
state.  All we need for the proposed entry is the current value of
the IMA PCR so provided you trust that value it becomes a base on
which the following measurements can build and be trusted.
The IMA measurement list may contain multiple PCRs, not just the
default IMA PCR.   Each kexec results in an additional boot_aggregate
record, but an equivalent record for after truncating the measurement
list might help.
Right, this would specifically be only of the IMA PCR so you can use it
as a base to begin the hash of the following log segment.  The log can
still contain other boot aggregate entries, but the assumption is that
boot aggregate entries in the prior log have already been evaluated.

But this doesn't address where the offloaded measurement list
will be stored, how long the list will be retained, nor who
guarantees the integrity of the offloaded list.  In addition,
different form factors will have different requirements.

For how long the list would be retained, or in the case of a log segments, it
might make sense to have that be an admin decision, something that can be
configured to satisfy the needs of a specific system, as mentioned below by
James, does that seem correct?

Given the possibility of keeping the logs around for an indefinite amount of
time, would using an expansion of the method present in this RFC be more
appropriate than going down the vfs_tmpfile route? Forgive my lack on expertise
on mm, but would the vfs_tmpfile approach work for keeping several log segments
across multiple kexecs?

For how to guarantee the integrity of the offloaded logs, James suggestion
of using TPM for adding a signature to the log entries brings the question
of what key would be used and how it would be verified, I am trying to give
this some thought.

I'm not sure you need any store at all.  The basic idea is that the
log is divided into individually verifiable segments.  For auditing
purposes you could keep all segments, so you have the entire log,
but if you've acted on the prior log entries and you don't have an
audit reason to keep them, you could erase that segment of the log
because you've placed all your trust in the prior log segments into
the PCR entry that forms the base of your current segment.

Essentially the question devolves to what mechanisms can give you
this trust in the base PCR log entry.

Not retaining the entire measurement list would limit it's verification
to a single server/system.

Well, it would limit its verification to just that log segment, yes.

I'm thinking in the cloud there are a couple of potential consumers:

    1. The cloud monitor, which acts on the verified log, such as killing a
       node for trying to execute an unverified binary or emailing the
       guest owner.  This type of consumer doesn't need the historical log,
       they just need to verify the entries they haven't already seen and
       act on them according to whatever policy they're given.
    2. The second type of cloud consumer is the audit case where the
       aggregate hash is used to assure some auditor, some time after the
       actual events, that the entire runtime of the VM was properly
       monitored and the auditor wants to see the log  or a segment of it
       to prove the hash.

Case 1 doesn't need historical storage, case 2 definitely does.  I
think we should support both use cases particularly in the long running
scenario where we need to recover memory.  Having verifiable log
segments seems to satisfy both cases, but what you do with the segments
would vary.

James