On 5/10/21 4:02 PM, Sean Christopherson wrote: > On Mon, May 10, 2021, Tom Lendacky wrote: >> On 5/10/21 11:10 AM, Sean Christopherson wrote: >>> On Fri, May 07, 2021, Tom Lendacky wrote: >>>> On 5/7/21 11:59 AM, Sean Christopherson wrote: >>>>> Allow userspace to set CR0, CR4, CR8, and EFER via KVM_SET_SREGS for >>>>> protected guests, e.g. for SEV-ES guests with an encrypted VMSA. KVM >>>>> tracks the aforementioned registers by trapping guest writes, and also >>>>> exposes the values to userspace via KVM_GET_SREGS. Skipping the regs >>>>> in KVM_SET_SREGS prevents userspace from updating KVM's CPU model to >>>>> match the known hardware state. >>>> >>>> This is very similar to the original patch I had proposed that you were >>>> against :) >>> >>> I hope/think my position was that it should be unnecessary for KVM to need to >>> know the guest's CR0/4/0 and EFER values, i.e. even the trapping is unnecessary. >>> I was going to say I had a change of heart, as EFER.LMA in particular could >>> still be required to identify 64-bit mode, but that's wrong; EFER.LMA only gets >>> us long mode, the full is_64_bit_mode() needs access to cs.L, which AFAICT isn't >>> provided by #VMGEXIT or trapping. >> >> Right, that one is missing. If you take a VMGEXIT that uses the GHCB, then >> I think you can assume we're in 64-bit mode. > > But that's not technically guaranteed. The GHCB even seems to imply that there > are scenarios where it's legal/expected to do VMGEXIT with a valid GHCB outside > of 64-bit mode: > > However, instead of issuing a HLT instruction, the AP will issue a VMGEXIT > with SW_EXITCODE of 0x8000_0004 ((this implies that the GHCB was updated prior > to leaving 64-bit long mode). Right, but in order to fill in the GHCB so that the hypervisor can read it, the guest had to have been in 64-bit mode. Otherwise, whatever the guest wrote will be seen as encrypted data and make no sense to the hypervisor anyway. > > In practice, assuming the guest is in 64-bit mode will likely work, especially > since the MSR-based protocol is extremely limited, but ideally there should be > stronger language in the GHCB to define the exact VMM assumptions/behaviors. > > On the flip side, that assumption and the limited exposure through the MSR > protocol means trapping CR0, CR4, and EFER is pointless. I don't see how KVM > can do anything useful with that information outside of VMGEXITs. Page tables > are encrypted and GPRs are stale; what else could KVM possibly do with > identifying protected mode, paging, and/or 64-bit? > >>> Unless I'm missing something, that means that VMGEXIT(VMMCALL) is broken since >>> KVM will incorrectly crush (or preserve) bits 63:32 of GPRs. I'm guessing no >>> one has reported a bug because either (a) no one has tested a hypercall that >>> requires bits 63:32 in a GPR or (b) the guest just happens to be in 64-bit mode >>> when KVM_SEV_LAUNCH_UPDATE_VMSA is invoked and so the segment registers are >>> frozen to make it appear as if the guest is perpetually in 64-bit mode. >> >> I don't think it's (b) since the LAUNCH_UPDATE_VMSA is done against reset- >> state vCPUs. >> >>> >>> I see that sev_es_validate_vmgexit() checks ghcb_cpl_is_valid(), but isn't that >>> either pointless or indicative of a much, much bigger problem? If VMGEXIT is >> >> It is needed for the VMMCALL exit. >> >>> restricted to CPL0, then the check is pointless. If VMGEXIT isn't restricted to >>> CPL0, then KVM has a big gaping hole that allows a malicious/broken guest >>> userspace to crash the VM simply by executing VMGEXIT. Since valid_bitmap is >>> cleared during VMGEXIT handling, I don't think guest userspace can attack/corrupt >>> the guest kernel by doing a replay attack, but it does all but guarantee a >>> VMGEXIT at CPL>0 will be fatal since the required valid bits won't be set. >> >> Right, so I think some cleanup is needed there, both for the guest and the >> hypervisor: >> >> - For the guest, we could just clear the valid bitmask before leaving the >> #VC handler/releasing the GHCB. Userspace can't update the GHCB, so any >> VMGEXIT from userspace would just look like a no-op with the below >> change to KVM. > > Ah, right, the exit_code and exit infos need to be valid. > >> - For KVM, instead of returning -EINVAL from sev_es_validate_vmgexit(), we >> return the #GP action through the GHCB and continue running the guest. > > Agreed, KVM should never kill the guest in response to a bad VMGEXIT. That > should always be a guest decision. > >>> Sadly, the APM doesn't describe the VMGEXIT behavior, nor does any of the SEV-ES >>> documentation I have. I assume VMGEXIT is recognized at CPL>0 since it morphs >>> to VMMCALL when SEV-ES isn't active. >> >> Correct. >> >>> >>> I.e. either the ghcb_cpl_is_valid() check should be nuked, or more likely KVM >> >> The ghcb_cpl_is_valid() is still needed to see whether the VMMCALL was >> from userspace or not (a VMMCALL will generate a #VC). > > Blech. I get that the GHCB spec says CPL must be provided/checked for VMMCALL, > but IMO that makes no sense whatsover. > > If the guest restricts the GHCB to CPL0, then the CPL field is pointless because > the VMGEXIT will only ever come from CPL0. Yes, technically the guest kernel > can proxy a VMMCALL from userspace to the host, but the guest kernel _must_ be > the one to enforce any desired CPL checks because the VMM is untrusted, at least > once you get to SNP. > > If the guest exposes the GHCB to any CPL, then the CPL check is worthless because The GHCB itself is not exposed to any CPL. A VMMCALL will generate a #VC. The guest #VC handler will extract the CPL level from the context that generated the #VC (see vc_handle_vmmcall() in arch/x86/kernel/sev-es.c), so that a VMMCALL from userspace will have the proper CPL value in the GHCB when the #VC handler issues the VMGEXIT instruction. Thanks, Tom > guest userspace can simply lie about the CPL. And exposing the GCHB to userspace > completely undermines guest privilege separation since hardware doesn't provide > the real CPL, i.e. the VMM, even it were trusted, can't determine the origin of > the VMGEXIT. >