Re: [RFC PATCH 00/23] KVM SGX virtualization support

Dave Hansen <dave.hansen@xxxxxxxxx> · Wed, 6 Jan 2021 09:07:13 -0800

On 1/5/21 5:55 PM, Kai Huang wrote:
> - Virtual EPC
> 
> "Virtual EPC" is the EPC section exposed by KVM to guest so SGX software in
> guest can discover it and use it to create SGX enclaves. KVM exposes SGX to 
> guest via CPUID, and exposes one or more "virtual EPC" sections for guest.
> The size of "virtual EPC" is passed as Qemu parameter when creating the
> guest, and the base address is calcualted internally according to guest's

				^ calculated

> configuration.

This is not a great first paragraph to introduce me to this feature.

Please remind us what EPC *is*, then you can go and talk about why we
have to virtualize it, and how "virtual EPC" is different from normal
EPC.  For instance:

SGX enclave memory is special and is reserved specifically for enclave
use.  In bare-metal SGX enclaves, the kernel allocates enclave pages,
copies data into the pages with privileged instructions, then allows the
enclave to start.  In this scenario, only initialized pages already
assigned to an enclave are mapped to userspace.

In virtualized environments, the hypervisor still needs to do the
physical enclave page allocation.  The guest kernel is responsible for
the data copying (among other things).  This means that the job of
starting an enclave is now split between hypervisor and guest.

This series introduces a new misc device: /dev/sgx_virt_epc.  This
device allows the host to map *uninitialized* enclave memory into
userspace, which can then be passed into a guest.

While it might be *possible* to start a host-side enclave with
/dev/sgx_enclave and pass its memory into a guest, it would be wasteful
and convoluted.

> core/driver to allow userspace (Qemu) to allocate "raw" EPC, and use it as
> "virtual EPC" for guest. Obviously, unlike EPC allocated for host SGX driver,
> virtual EPC allocated via /dev/sgx_virt_epc doesn't have enclave associated,
> and how virtual EPC is used by guest is compeletely controlled by guest's SGX

					   ^ completely

Please run a spell checker on this thing.

> software.
> 
> Implement the "raw" EPC allocation in the x86 core-SGX subsystem via
> /dev/sgx_virt_epc rather than in KVM. Doing so has two major advantages:
> 
>   - Does not require changes to KVM's uAPI, e.g. EPC gets handled as
>     just another memory backend for guests.
> 
>   - EPC management is wholly contained in the SGX subsystem, e.g. SGX
>     does not have to export any symbols, changes to reclaim flows don't
>     need to be routed through KVM, SGX's dirty laundry doesn't have to
>     get aired out for the world to see, and so on and so forth.
> 
> The virtual EPC allocated to guests is currently not reclaimable, due to
> reclaiming EPC from KVM guests is not currently supported. Due to the
> complications of handling reclaim conflicts between guest and host, KVM
> EPC oversubscription, which allows total virtual EPC size greater than
> physical EPC by being able to reclaiming guests' EPC, is significantly more
> complex than basic support for SGX virtualization.

It would also help here to remind the reader that enclave pages have a
special reclaim mechanism separtae from normal page reclaim, and that
mechanism is disabled for these pages.

Does the *ABI* here preclude doing oversubscription in the future?

> - Support SGX virtualization without SGX Launch Control unlocked mode
> 
> Although SGX driver requires SGX Launch Control unlocked mode to work, SGX

Although the bare-metal SGX driver requires...

Also, didn't we call this "Flexible Launch Control"?

> virtualization doesn't, since how enclave is created is completely controlled
> by guest SGX software, which is not necessarily linux. Therefore, this series
> allows KVM to expose SGX to guest even SGX Launch Control is in locked mode,

... "expose SGX to guests even if" ...

> or is not present at all. The reason is the goal of SGX virtualization, or
> virtualization in general, is to expose hardware feature to guest, but not to
> make assumption how guest will use it. Therefore, KVM should support SGX guest
> as long as hardware is able to, to have chance to support more potential use
> cases in cloud environment.

This is kinda long-winded and misses a lot of important context.  How about:

SGX hardware supports two "launch control" modes to limit which enclaves
can run.  In the "locked" mode, the hardware prevents enclaves from
running unless they are blessed by a third party.  In the unlocked mode,
the kernel is in full control of which enclaves can run.  The bare-metal
SGX code refuses to launch enclaves unless it is in the unlocked mode.

This sgx_virt_epc driver does not have such a restriction.  This allows
guests which are OK with the locked mode to use SGX, even if the host
kernel refuses to.

> - Support exposing SGX2
> 
> Due to the same reason above, SGX2 feature detection is added to core SGX code
> to allow KVM to expose SGX2 to guest, even currently SGX driver doesn't support
> SGX2, because SGX2 can work just fine in guest w/o any interaction to host SGX
> driver.
> 
> - Restricit SGX guest access to provisioning key
> 
> To grant guest being able to fully use SGX, guest needs to be able to create
> provisioning enclave.

"enclave" or "enclaves"?

> However provisioning key is sensitive and is restricted by

	^ the

> /dev/sgx_provision in host SGX driver, therefore KVM SGX virtualization follows
> the same role: a new KVM_CAP_SGX_ATTRIBUTE is added to KVM uAPI, and only file
> descriptor of /dev/sgx_provision is passed to that CAP by usersppace hypervisor
> (Qemu) when creating the guest, it can access provisioning bit. This is done by
> making KVM trape ECREATE instruction from guest, and check the provisioning bit

		^ trap

> in ECREATE's attribute.

The grammar in that paragraph is really off to me.  Can you give it
another go?