Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support

"Huang, Kai" <kai.huang@xxxxxxxxxxxxxxx> · Wed, 7 Jun 2017 10:51:51 +1200

On 6/7/2017 9:22 AM, Andy Lutomirski wrote:
On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@xxxxxxxxxxxxxxx> wrote:

On 5/18/2017 7:45 PM, Huang, Kai wrote:

On 5/17/2017 12:09 PM, Andy Lutomirski wrote:

On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@xxxxxxxxxxxxxxx>
wrote:

On 5/12/2017 6:11 PM, Andy Lutomirski wrote:

On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@xxxxxxxxxxxxxxx>
wrote:

I am not sure whether the cost of writing to 4 MSRs would be
*extremely*
slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
writing
to several MSRs, etc.

I'm speculating that these MSRs may be rather unoptimized and hence
unusualy slow.

Have a percpu variable that stores the current SGXLEPUBKEYHASH along
with whatever lock is needed (probably just a mutex).  Users of EINIT
will take the mutex, compare the percpu variable to the desired
value,
and, if it's different, do WRMSR and update the percpu variable.

KVM will implement writes to SGXLEPUBKEYHASH by updating its
in-memory
state but *not* changing the MSRs.  KVM will trap and emulate EINIT
to
support the same handling as the host.  There is no action required
at
all on KVM guest entry and exit.

This is doable, but SGX driver needs to do those things and expose
interfaces for KVM to use. In terms of the percpu data, it is nice to
have,
but I am not sure whether it is mandatory, as IMO EINIT is not even in
performance critical path. We can simply read old value from MSRs out
and
compare whether the old equals to the new.

I think the SGX driver should probably live in arch/x86, and the
interface could be a simple percpu variable that is exported (from the
main kernel image, not from a module).

FWIW, I think that KVM will, in the long run, want to trap EINIT for
other reasons: someone is going to want to implement policy for what
enclaves are allowed that applies to guests as well as the host.

I am not very convinced why "what enclaves are allowed" in host would
apply
to guest. Can you elaborate? I mean in general virtualization just
focus
emulating hardware behavior. If a native machine is able to run any
LE,
the
virtual machine should be able to as well (of course, with guest's
IA32_FEATURE_CONTROL[bit 17] set).

I strongly disagree.  I can imagine two classes of sensible policies
for launch control:

1. Allow everything.  This seems quite sensible to me.

2. Allow some things, and make sure that VMs have at least as
restrictive a policy as host root has.  After all, what's the point of
restricting enclaves in the host if host code can simply spawn a
little VM to run otherwise-disallowed enclaves?

What's the current SGX driver launch control policy? Yes allow
everything
works for KVM so lets skip this. Are we going to support allowing
several
LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
staff but I don't know details.

I am trying to find a way that we can both not break host launch control
policy, and be consistent to HW behavior (from guest's view). Currently
we
can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
either
enabled or disabled. I introduced an Qemu parameter 'lewr' for this
purpose.
Actually I introduced below Qemu SGX parameters for creating guest:

         -sgx epc=<size>,lehash='SHA-256 hash',lewr

where 'epc' specifies guest's EPC size, lehash specifies (initial) value
of
guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is
allowed
to change guest's IA32_SGXLEPUBKEYHASHn at runtime.

If host only allows one single LE to run, KVM can add a restrict that
only
allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
disabled, so that only host allowed (single) hash can be used by guest.
From
guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single)
hash.

If host allows several LEs (not but everything), and if we create guest
with
'lewr', then the behavior is not consistent with HW behavior, as from
guest's hardware's point of view, we can actually run any LE but we have
to
tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to
some
specific values. One compromise solution is we don't allow to create
guest
with 'lewr' specified, and at the meantime, only allow to create guest
with
host approved hashes specified in 'lehash'. This will make guest's
behavior
consistent to HW behavior but only allows guest to run one LE (which is
specified by 'lehash' when guest is created).

I'm not sure I entirely agree for a couple reasons.

1. I wouldn't be surprised if the kernel ends up implementing a policy
in which it checks all enclaves (not just LEs) for acceptability.  In
fact, if the kernel sticks with the "no LE at all or just
kernel-internal LE", then checking enclaves directly against some
admin- or distro-provided signer list seems reasonable.  This type of
policy can't be forwarded to a guest by restricting allowed LE
signers.  But this is mostly speculation since AFAIK no one has
seriously proposed any particular policy support and the plan was to
not have this for the initial implementation.

2. While matching hardware behavior is nice in principle, there
doesn't seem to be useful hardware behavior to match here.  If the
host had a list of five allowed LE signers, how exactly would it
restrict the MSRs?  They're not written atomically, so you can't
directly tell what's being written.

In this case I actually plan to just allow creating guest with guest's
IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
specified, creating guest will fail. And we only allow creating guest with
host allowed hash values (with 'lehash=hash-value'), and if 'hash-value'
specified by 'lehash' is not allowed by host, we also fail to create guest.

We can only allow creating guest with 'lewr' specified when host allows
anything.

But in this way, we are restricting guest OS's ability to run LE, as only
one LE, that is specified by 'lehash' parameter, can be run. But I think
this won't hurt much, as multiple guests still are able to run different
LEs?

Also, the only way to fail an MSR

write is to send #GP, and Windows (and even Linux) may not expect
that.  Linux doesn't panic due to #GP on MSR writes these days, but
you still get a big fat warning.  I wouldn't be at all surprised if
Windows BSODs.

We cannot allow writing some particular value to MSRs successfully, while
injecting #GP when writing other values to the same MSRs. So #GP is not
option.

ENCLS[EINIT], on the other hand, returns an actual

error code.  I'm not sure that a sensible error code exists
("SGX_HYPERVISOR_SAID_NO?", perhaps),

Looks no such error code exists. And we cannot return such error code to
guest as such error code is only supposed to be valid when ENCLS is run in
hypervisor.

but SGX_INVALID_EINITTOKEN seems

to mean, more or less, "the CPU thinks you're not authorized to do
this", so forcing that error code could be entirely reasonable.

If the host policy is to allow a list of LE signers, you could return
SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
the list.

But this would be inconsistent with HW behavior. If the hash value in
guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, EINIT
is not supposed to return SGX_INVALID_EINITTOKEN.

I think from VMM's perspective, emulating HW behavior to be consistent
with real HW behavior is very important.

Paolo, would you provide your comments?

Hi all,

This has been quite for a while and I'd like to start discussion again.
Jarkko told me that currently he only supports one LE in SGX driver, but I
am not sure whether he is going to extend in the future or not. I think this
might also depend on requirements from customers.

Andy,

If we only support one LE in driver, then we can only support the same LE
for all KVM guests, according to your comments that host kernel launch
control policy should also apply to KVM guests? WOuld you comments more?

I'm not at all convinced that, going forward, Linux's host-side launch
control policy will be entirely contained in the LE.  I'm also not
convinced that non-Linux guests will function at all under this type
of policy -- what if FreeBSD's LE is different for whatever reason?

I am not convinced either. I think we need Jarkko to elaborate how is 
host side launch control policy implemented, or is there any policy at 
all. I also tried to read SGX driver code but looks I couldn't find any 
implementation regarding to this.

Hi Jarkko,

Can you elaborate on this?

Thanks,
-Kai

Jarkko,

Could you help to clarify the whole launch control policy in host side so
that we can have a better understanding together?

Thanks,
-Kai

--Andy