RE: Linux guest kernel threat model for Confidential Computing

"Reshetova, Elena" <elena.reshetova@xxxxxxxxx> · Thu, 26 Jan 2023 08:08:27 +0000

 On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > Again, as our documentation states, when you submit patches based on
> > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > crazy and will get your patches rejected.  You all know this, why ignore
> > > it?
> >
> > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > we are submitting a fix that we have to list the way how it has been found.
> > We will fix this in the future submissions, but some bugs we have are found by
> > plain code audit, so 'human' is the tool.
> 
> So the concern is that *you* may think it is a bug, but other people
> may not agree.  Perhaps what is needed is a full description of the
> goals of Confidential Computing, and what is in scope, and what is
> deliberately *not* in scope.  I predict that when you do this, that
> people will come out of the wood work and say, no wait, "CoCo ala
> S/390 means FOO", and "CoCo ala AMD means BAR", and "CoCo ala RISC V
> means QUUX".

Agree, and this is the reason behind starting this thread: to make sure people
agree on the threat model.  The only reason why we submitted some trivial bugs
fixes separately is the fact that they *also* can be considered bugs under existing
threat model, if one thinks that kernel should be as robust as possible against 
potential erroneous devices.

As described right in the beginning of the doc I shared [1] (adjusted now to remove
'TDX' and put generic 'CC guest kernel'), we want to make sure that an untrusted
host (and hypervisor) is not able to

1. archive privileged escalation into a CC guest kernel
2.  compromise the confidentiality or integrity of CC guest private memory

The above security objectives give us two primary assets we want to protect:
CC guest execution context and CC guest private memory confidentiality and
integrity. 

The DoS from the host towards CC guest is explicitly out of scope and a non-security
objective. 

The attack surface in question is any interface exposed from a CC guest kernel
towards untrusted host that is not covered by the CC HW protections. Here the
exact list can differ somewhat depending on what technology is being used, but as
David already pointed out before: both CC guest memory and register state is
protected from host attacks, so we are focusing on other communication channels
and on generic interfaces used by Linux today. 

Examples of such interfaces for TDX (and I think SEV shares most of them, but please
correct me if I am wrong here) are access to some MSRs and CPUIDs, port IO, MMIO
and DMA, access to PCI config space, KVM hypercalls (if hypervisor is KVM), TDX specific
hypercalls (this is technology specific), data consumed from untrusted host during the
CC guest initialization (including kernel itself, kernel command line, provided ACPI tables, 
etc) and others described in [1].
An important note here is that these interfaces are not limited just to device drivers
(albeit device drivers are the biggest users for some of them), they are present through the whole 
kernel in different subsystems and need careful examination and development of 
mitigations. 

The possible range of mitigations that we can apply is also wide, but you can roughly split it into
two groups: 

1. mitigations that use various attestation mechanisms (we can attest the kernel code,
cmline, ACPI tables being provided and other potential configurations, and one day we 
will hopefully also be able to attest devices we connect to CC guest and their configuration)

2. other mitigations for threats that attestation cannot cover, i.e. mainly runtime 
interactions with the host. 

Above sounds conceptually simple but the devil is as usual in details, but it doesn’t look
very impossible or smth that would need the ***insane*** changes to the entire kernel.

> 
> Others may end up objecting, "no wait, doing this is going to mean
> ***insane*** changes to the entire kernel, and this will be a
> performance / maintenance nightmare and unless you fix your hardware
> in future chips, we wlil consider this a hardware bug and reject all
> of your patches".
> 
> But it's better to figure this out now, then after you get hundreds of
> patches into the upstream kernel, we discover that this is only 5% of
> the necessary changes, and then the rest of your patches are rejected,
> and you have to end up fixing the hardware anyway, with the patches
> upstreamed so far being wasted effort.  :-)
> 
> If we get consensus on that document, then that can get checked into
> Documentation, and that can represent general consensus on the problem
> early on.

Sure, I am willing to work on this since we already spent quite a lot of effort
looking into this problem. My only question is how to organize a review of such
document in a sane and productive way and to make sure all relevant people
are included into discussion. As I said, this spawns across many areas in kernel,
and ideally you would want different people review their area in detail. 
For example, one of many aspects we need to worry is security of CC guest LRNG (
especially in cases when we don’t have a trusted security HW source of entropy)
[2] and here a feedback from LRNG experts would be important. 

I guess the first clear step I can do is to re-write the relevant part of [1]  into a CC-technology
neutral language and then would need feedback and input from AMD guys to make 
sure it correctly reflects their case also. We can probably do this preparation work 
on linux-coco mailing list and then post for a wider review? 

Best Regards,
Elena.

[1] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#threat-model
[2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#randomness-inside-tdx-guest

> 
> 						- Ted