Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support

"Huang, Kai" <kai.huang@xxxxxxxxxxxxxxx> · Tue, 16 May 2017 12:48:35 +1200

On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@xxxxxxxxxxxxxxx> wrote:
I am not sure whether the cost of writing to 4 MSRs would be *extremely*
slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
to several MSRs, etc.

I'm speculating that these MSRs may be rather unoptimized and hence
unusualy slow.

Have a percpu variable that stores the current SGXLEPUBKEYHASH along
with whatever lock is needed (probably just a mutex).  Users of EINIT
will take the mutex, compare the percpu variable to the desired value,
and, if it's different, do WRMSR and update the percpu variable.

KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
support the same handling as the host.  There is no action required at
all on KVM guest entry and exit.

This is doable, but SGX driver needs to do those things and expose
interfaces for KVM to use. In terms of the percpu data, it is nice to have,
but I am not sure whether it is mandatory, as IMO EINIT is not even in
performance critical path. We can simply read old value from MSRs out and
compare whether the old equals to the new.

I think the SGX driver should probably live in arch/x86, and the
interface could be a simple percpu variable that is exported (from the
main kernel image, not from a module).

FWIW, I think that KVM will, in the long run, want to trap EINIT for
other reasons: someone is going to want to implement policy for what
enclaves are allowed that applies to guests as well as the host.

I am not very convinced why "what enclaves are allowed" in host would apply
to guest. Can you elaborate? I mean in general virtualization just focus
emulating hardware behavior. If a native machine is able to run any LE, the
virtual machine should be able to as well (of course, with guest's
IA32_FEATURE_CONTROL[bit 17] set).

I strongly disagree.  I can imagine two classes of sensible policies
for launch control:

1. Allow everything.  This seems quite sensible to me.

2. Allow some things, and make sure that VMs have at least as
restrictive a policy as host root has.  After all, what's the point of
restricting enclaves in the host if host code can simply spawn a
little VM to run otherwise-disallowed enclaves?

What's the current SGX driver launch control policy? Yes allow 
everything works for KVM so lets skip this. Are we going to support 
allowing several LEs, or just allowing one single LE? I know Jarkko is 
doing in-kernel LE staff but I don't know details.

I am trying to find a way that we can both not break host launch control 
policy, and be consistent to HW behavior (from guest's view). Currently 
we can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn 
either enabled or disabled. I introduced an Qemu parameter 'lewr' for 
this purpose. Actually I introduced below Qemu SGX parameters for 
creating guest:

	-sgx epc=<size>,lehash='SHA-256 hash',lewr

where 'epc' specifies guest's EPC size, lehash specifies (initial) value 
of guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is 
allowed to change guest's IA32_SGXLEPUBKEYHASHn at runtime.

If host only allows one single LE to run, KVM can add a restrict that 
only allows to create KVM guest with runtime change to 
IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash 
can be used by guest. From guest's view, it simply has 
IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with 
default value to be host allowed (single) hash.

If host allows several LEs (not but everything), and if we create guest 
with 'lewr', then the behavior is not consistent with HW behavior, as 
from guest's hardware's point of view, we can actually run any LE but we 
have to tell guest that you are only allowed to change 
IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution 
is we don't allow to create guest with 'lewr' specified, and at the 
meantime, only allow to create guest with host approved hashes specified 
in 'lehash'. This will make guest's behavior consistent to HW behavior 
but only allows guest to run one LE (which is specified by 'lehash' when 
guest is created).

I'd like to hear comments from you guys.

Paolo, do you also have comments here from KVM's side?

Thanks,
-Kai

Also, some day Intel may fix its architectural design flaw [1] by
allowing EINIT to personalize the enclave's keying, and, if it's done
by a new argument to EINIT instead of an MSR, KVM will have to trap
EINIT to handle it.

Looks this flaw is not the same issue as above (host enclave policy applies
to guest)?

It's related.  Without this flaw, it might make sense to apply looser
policy in the guest as in the host.  With this flaw, I think your
policy fails to have any real effect if you don't enforce it on
guests.

One argument against this approach is KVM guest should never have impact
on
host side, meaning host should not be aware of such MSR change

As a somewhat generic comment, I don't like this approach to KVM
development.  KVM mucks with lots of important architectural control
registers, and, in all too many cases, it tries to do so independently
of the other arch/x86 code.  This ends up causing all kinds of grief.

Can't KVM and the real x86 arch code cooperate for real?  The host and
the KVM code are in arch/x86 in the same source tree.

Currently on host SGX driver, which is pretty much self-contained,
implements all SGX related staff.

I will probably NAK this if it comes my way for inclusion upstream.
Just because it can be self-contained doesn't mean it should be
self-contained.

I would advocate for the former approach.  (But you can't remap the
parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
don't see why this is any more complicated than emulating any other
instruction that accesses memory.)

No you cannot just copy. Because all address in guest's ENCLS parameters are
guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
guest virtual addresses is used in ENCLS parameters, for example,
PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
virtual address.

Btw, what is TOCTOU issue? would you also elaborate locking issue?

I was partially mis-remembering how this worked.  It looks like
SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
mapped.  If KVM applied some policy to the launchable enclaves, it
would want to make sure that it only looks at fields that are copied
to make sure that the enclave that gets launched is the one it
verified.  The locking issue I'm imagining is that the SECS (or
whatever else might be mapped) doesn't disappear and get reused for
something else while it's mapped in the host.  Presumably KVM has an
existing mechanism for this, but maybe SECS is special because it's
not quite normal memory IIRC.

If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is
wrong and then clear the exit flag once the MSRs are in sync.  You'll
need to be careful to avoid races in which the host's value leaks into
the guest.  I think you'll find that this is more complicated, less
flexible, and less performant than just handling ENCLS[EINIT] directly
in the host.

Sorry I don't quite follow this part. Why would host's value leaks into
guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's MSR
read/write is always trapped and emulated by KVM.

You'd need to make sure that this sequence of events doesn't happen:

 - Guest does EINIT and it exits.
 - Host updates the MSRs and the ENCLS-exiting bitmap.
 - Guest is preempted before it retries EINIT.
 - A different host thread launches an enclave, thus changing the MSRs.
 - Guest resumes and runs EINIT without exiting with the wrong MSR values.

[1] Guests that steal sealed data from each other or from the host can
manipulate that data without compromising the hypervisor by simply
loading the same enclave that its rightful owner would use.  If you're
trying to use SGX to protect your crypto credentials so that, if
stolen, they can't be used outside the guest, I would consider this to
be a major flaw.  It breaks the security model in a multi-tenant cloud
situation.  I've complained about it before.

Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
case even it is leaked looks we cannot dig anything out just the hash value?

Not sure what you mean.  Are you asking about the lack of guest personalization?

Concretely, imagine I write an enclave that seals my TLS client
certificate's private key and offers an API to sign TLS certificate
requests with it.  This way, if my system is compromised, an attacker
can use the certificate only so long as they have access to my
machine.  If I kick them out or if they merely get the ability to read
the sealed data but not to execute code, the private key should still
be safe.  But, if this system is a VM guest, the attacker could run
the exact same enclave on another guest on the same physical CPU and
sign using my key.  Whoops!