On 9/17/20 10:40 PM, Sean Christopherson wrote:
On Thu, Sep 17, 2020 at 01:56:21PM -0500, Tom Lendacky wrote:
On 9/17/20 12:28 PM, Dr. David Alan Gilbert wrote:
* Tom Lendacky (thomas.lendacky@xxxxxxx) wrote:
From: Tom Lendacky <thomas.lendacky@xxxxxxx>
This patch series provides support for launching an SEV-ES guest.
Secure Encrypted Virtualization - Encrypted State (SEV-ES) expands on the
SEV support to protect the guest register state from the hypervisor. See
"AMD64 Architecture Programmer's Manual Volume 2: System Programming",
section "15.35 Encrypted State (SEV-ES)" [1].
In order to allow a hypervisor to perform functions on behalf of a guest,
there is architectural support for notifying a guest's operating system
when certain types of VMEXITs are about to occur. This allows the guest to
selectively share information with the hypervisor to satisfy the requested
function. The notification is performed using a new exception, the VMM
Communication exception (#VC). The information is shared through the
Guest-Hypervisor Communication Block (GHCB) using the VMGEXIT instruction.
The GHCB format and the protocol for using it is documented in "SEV-ES
Guest-Hypervisor Communication Block Standardization" [2].
The main areas of the Qemu code that are updated to support SEV-ES are
around the SEV guest launch process and AP booting in order to support
booting multiple vCPUs.
There are no new command line switches required. Instead, the desire for
SEV-ES is presented using the SEV policy object. Bit 2 of the SEV policy
object indicates that SEV-ES is required.
The SEV launch process is updated in two ways. The first is that a the
KVM_SEV_ES_INIT ioctl is used to initialize the guest instead of the
standard KVM_SEV_INIT ioctl. The second is that before the SEV launch
measurement is calculated, the LAUNCH_UPDATE_VMSA SEV API is invoked for
each vCPU that Qemu has created. Once the LAUNCH_UPDATE_VMSA API has been
invoked, no direct changes to the guest register state can be made.
AP booting poses some interesting challenges. The INIT-SIPI-SIPI sequence
is typically used to boot the APs. However, the hypervisor is not allowed
to update the guest registers. For the APs, the reset vector must be known
in advance. An OVMF method to provide a known reset vector address exists
by providing an SEV information block, identified by UUID, near the end of
the firmware [3]. OVMF will program the jump to the actual reset vector in
this area of memory. Since the memory location is known in advance, an AP
can be created with the known reset vector address as its starting CS:IP.
The GHCB document [2] talks about how SMP booting under SEV-ES is
performed. SEV-ES also requires the use of the in-kernel irqchip support
in order to minimize the changes required to Qemu to support AP booting.
Some random thoughts:
a) Is there something that explicitly disallows SMM?
There isn't currently. Is there a way to know early on that SMM is enabled?
Could I just call x86_machine_is_smm_enabled() to check that?
KVM_CAP_X86_SMM is currently checked as a KVM-wide capability. One option
is to change that to use a per-VM ioctl() and then have KVM return 0 for
SEV-ES VMs.
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 416c82048a..4d7f84ed1b 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -145,7 +145,7 @@ int kvm_has_pit_state2(void)
bool kvm_has_smm(void)
{
- return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM);
+ return kvm_vm_check_extension(kvm_state, KVM_CAP_X86_SMM);
}
This will work. I'll have to modify the has_emulated_msr() op in the
kernel as part of the the SEV-ES support to take a struct kvm argument.
I'll be sure to include a comment that the struct kvm argument could be
NULL, since that op is also used during KVM module initialization and is
called before VM initialization (and therefore a struct kvm instance), too.
Thanks,
Tom
bool kvm_has_adjust_clock_stable(void)
b) I think all the interfaces you're using are already defined in
Linux header files - even if the code to implement them isn't actually
upstream in the kernel yet (the launch_update in particular) - we
normally wait for the kernel interface to be accepted before taking the
QEMU patches, but if the constants are in the headers already I'm not
sure what the rule is.
Correct, everything was already present from a Linux header perspective.
c) What happens if QEMU reads the register values from the state if
the guest is paused - does it just see junk? I'm just wondering if you
need to add checks in places it might try to.
I thought about what to do about calls to read the registers once the guest
state has become encrypted. I think it would take a lot of changes to make
Qemu "protected state aware" for what I see as little gain. Qemu is likely
to see a lot of zeroes or actual register values from the GHCB protocol for
previous VMGEXITs that took place.
Yeah, we more or less came to the same conclusion for TDX. It's easy enough
to throw an error if QEMU attempts to read protected state, but without
other invasive changes, it's too easy to unintentionally kill the VM. MSRs
are a bit of a special case, but for REGS, SREGS, and whatever other state
is read out, simply letting KVM return zeros/garbage seems like the lesser
of all evils.