Re: [ANNOUNCEMENT] COCONUT Secure VM Service Module for SEV-SNP

Alexander Graf <graf@xxxxxxxxxx> · Mon, 8 May 2023 07:16:53 +0200

On 06.05.23 14:48, James Bottomley wrote:

On Fri, 2023-05-05 at 14:35 +0200, Christophe de Dinechin wrote:
On 2023-05-04 at 13:04 -04, James Bottomley <jejb@xxxxxxxxxxxxx>
wrote...
On Wed, 2023-05-03 at 14:26 +0200, Jörg Rödel wrote:
[...]
And here come the 'BUT': Since the goal of having one project is
to bundle community efforts, I think that the joint efforts are
better targeted at getting CPL-3 support to a point where it can
run modules. On that side some input and help is needed,
especially to define the syscall interface so that it suits the
needs of a TPM implementation.
Crypto support in ring-0 is unavoidable if we want to retain
control of the VMPCK0 key in ring-0.  I can't see us giving it to
ring-3 because that would give up control of the SVSM identity and
basically make the ring-0 separation useless because you can
compromise ring-3 and get the key and then communicate with the PSP
as the SVSM.
I'm a but confused regarding the roles that VMPL vs rings is in the
security model here.
Heh, I think that goes for everyone, which is why I'm fishing for
information about the security model.  I don't think its enough to
blindly claim running at cpl3 gives more security, you have to have a
threat model that demonstrates it.

  In particular, I assume that any attack on ring3 would
still have to cross either the VMPL boundary (if coming from the
guest) or the TEE boundary (if coming from the host).
I think the attack theory is more like a privilege escalation: you
induce the SVSM to take a fault through its normal API mechanism by
crafting bad data (this means the runtime attack can only come from the
guest since the host doesn't get access to the SVSM at runtime,
although it could craft bad configuration data for boot time).

Assuming a successful exploit, the attacker now has the ability to run
code in the compromised module.  For a unitary SVSM, that would give
control of the entire SVSM.  For ring 0/3 separation, it should only
give control of the compromised module, which we're assuming is ring 3
code.  However, you're right, in that the attacker now has the ability
to execute VPML0 code, except that some privileged instructions (like
PVALIDATE) can only execute at ring 0, so the attack ability is
slightly more limited.

I think we're munging 2 things into the same conversation:

* Running code at lower privilege level so code execution attacks can 
not elevate beyond that code's area of responsibility
* Running code with its own address space, so accidental data leaks 
don't leak additionally sensitive information

I personally care more about the latter than the former, mostly also 
because of speculation side channel attacks. I really don't want to be 
in a world where you can train the branch predictor of the vTPM to 
access kernel memory. Code execution gadgets are less prevalent, 
especially if you use rust as programming language.

I've always considered the gold standard exploit of the SVSM to be one
that allows you to fake attestation reports, since then it's game over
as far as remote verification goes, which is why I want the VMPCK0 key
(the communication key VPML0 uses to get VMPL0 specific attestation
reports from the PSP) to be closely guarded at ring 0, making it harder
to compromise remote attestation via exploits.

IMHO that's slightly too focused on the attestation part of the story. 
Imagine an environment where someone managed to gain RCE in a random 
user space application of your VM that has no access to actual secrets / 
PEI / etc. Like a monitoring service running log4j :). That service 
however may have access to /dev/tpm0 (via whatever means) and thus is 
able to craft requests to the vTPM.

In this situation, you really want to ensure that the vTPM has 
absolutely 0 access to any memory it doesn't have to. Even better if you 
can ensure it stays that way (read: no way to write CR3).

The flip side is that, assuming the vTPM is the compromised service,
you've already got the ability to fake TPM based runtime attestation,
so its still game over from the remote attestation point of view.  This
leads me to conclude that it really doesn't matter where security
critical protocols run, and the only real advantage of the ring 0/3
separation would come into play if the SVSM had some non-security
critical protocols and it's not clear if it ever will.

I think going forward we will have to move the UEFI Variable Runtime 
Services into SVSM as well. There is no virtual SMM mode in SEV-SNP that 
I'm aware of and trusting the host environment for UEFI Secure Boot may 
or may not be desirable.

I think the above problem also indicates no-one really has a fully
thought out security model that shows practically how ring-3
improves the security posture.  So I really think starting in ring-
0 and then moving pieces to ring-3 and discussing whether this
materially improves the security posture based on the code and how
it operates gets us around the lack of understanding of the
security model because we proceed by evolution.
And there is definitely a lot of complexity added by supporting
ring3. You are essentially getting the complexity of a "real"
operating system.  By contrast, TDX is providing the same kind of
isolation with secure enclaves, but at least the base OS kernel is
shared.

The expected benefit is to be able to run more complex code from
ring3 with a better way to handle malfunctions, faults, whatever. At
least that's the way I read it. So it's a way to write software in a
more modular way.
Yes, but is that a benefit?  If one of the protocol modules faults, I
think you'd rather have a hard failure of the whole confidential VM
than a restart that gives an attacker more leeway to craft a
compromise.

It's also about the ability to detect failure. The more guard rails you 
put in, the more likely you can identify when something goes off. 
Whether we then attempt to continue execution or consider it a fatal 
event is up to us.

The worst case would be a fully populated linear address space where any 
pointer derailing is completely undetectable.

IIUC, the ring-3 modules of the SVSM would still be at VMPL0, so
presumably, not accesible from host or guest. If we consider this
property as strong, then do we really care entrusting ring3 with
sensitive data?
Well, as I said above, I think for security critical modules, it
doesn't matter where they run, so perhaps we don't care, but equally
there's not much security benefit to ring 0/3 separation either.

The next question that's going to arise is *where* the crypto
libraries should reside.  Given they're somewhat large, duplicating
them for every cpl-3 application plus cpl-3 seems wasteful, so some
type of vdso model sounds better (and might work instead of a
syscall interfaces for cpl-0 services that are pure code).
I don't understand what you call "pure code". I presume you mean
"code that does not need to access ring0 data"?
I was meaning a VDSO like model, where the openssl crypto code could be
exported from ring-0 as an executable library, but the data would live
with the corresponding consumer, so it could be used by the SVSM body
at ring-0 with any crypto data being held at ring-0 and inaccessible to
ring-3 consumers of the crypto code.

Have a look at this presentation, get all the way to the on-screen 
keyboard sniffer and then faithfully tell me again that you believe 
sharing code (and AES tables) is a good idea :)

https://media.ccc.de/v/33c3-8044-what_could_possibly_go_wrong_with_insert_x86_instruction_here

I also feel like I'm missing something obvious in the conversation. 
Let's imagine the worst case for size I can think of today. In that 
case, the main SVSM code, vTPM code as well as UEFI variable store would 
duplicate (parts of) crypto libraries. Let's again imagine the worst and 
no proper LTO, so we need to link all of openssl into all 2 components 
plus the Rust based crypto which we again assume to be as large.

In this worst case scenario not sharing the code wastes less than 1MiB 
altogether. That doesn't sound like something to optimize for at all at 
this stage?

Alex

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879