On Fri, Feb 03, 2023, Alexey Kardashevskiy wrote: > While at this, move set_/clr_dr_intercepts to .c and move #DB intercept > next to DR7 intercept. Please do non-trivial code movement in separate patches unless the functional change is trivial. Moving and changing at the same time makes the patch difficult to review. > @@ -52,9 +53,14 @@ module_param_named(sev, sev_enabled, bool, 0444); > /* enable/disable SEV-ES support */ > static bool sev_es_enabled = true; > module_param_named(sev_es, sev_es_enabled, bool, 0444); > + > +/* enable/disable SEV-ES DebugSwap support */ > +static bool sev_es_debug_swap_enabled = true; > +module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0644); Needs to be 0444, otherwise userspace can turn on the knob after KVM is loaded, which would allow enabling the feature on unsupported platforms, amongst many other problems. > void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector) > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 60c7c880266b..f8e222bee22a 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -671,6 +671,65 @@ static int svm_cpu_init(int cpu) > > } > > +static void set_dr_intercepts(struct vcpu_svm *svm) > +{ > + struct vmcb *vmcb = svm->vmcb01.ptr; > + bool intercept; > + > + if (!sev_es_guest(svm->vcpu.kvm)) { > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE); > + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE); > + } > + > + if (sev_es_guest(svm->vcpu.kvm)) { > + struct sev_es_save_area *save = svm->sev_es.vmsa; > + > + intercept = !(save->sev_features & SVM_SEV_FEAT_DEBUG_SWAP); Blech, the VMCB vs. SEV and SEV-ES code is a mess. E.g. init_vmcb() does /* * Guest access to VMware backdoor ports could legitimately * trigger #GP because of TSS I/O permission bitmap. * We intercept those #GP and allow access to them anyway * as VMware does. Don't intercept #GP for SEV guests as KVM can't * decrypt guest memory to decode the faulting instruction. */ if (enable_vmware_backdoor && !sev_guest(vcpu->kvm)) set_exception_intercept(svm, GP_VECTOR); but then sev_es_init_vmcb() also does: /* No support for enable_vmware_backdoor */ clr_exception_intercept(svm, GP_VECTOR); DR interception is a similar trainwreck. svm_sync_dirty_debug_regs() bails if guest_state_protected is true, i.e. is a nop for SEV-ES guests, but only after the vCPU has done LAUNCH_UPDATE_VMSA. IIUC, that's nonsensical because even before guest state is encrypted, #DB will be reflected as #VC into the guest. And, again IIUC, except for DR7, DRs are never intercepted for SEV-ES guests and so trying to debug from the host is futile as the guest can clobber DRs at any time. Similarly, flowing into dr_interception() on an SEV-ES VMGEXITis just dumb. KVM _knows_ it can't give the guest control of DR7, but it mucks with the intercepts anyways. That the GHCB spec even allows SVM_EXIT_{READ,WRITE}_DR7 is just asinine, but that's a moot point. Anyways, the GHCB spec's "suggestion" effectively says KVM's responsibility is purely to make a read of DR7 return the last written value. And of course KVM's disaster of a flow doesn't even do that unless the host is debugging the guest. Currently, hardware debug traps aren’t supported for an SEV-ES guest. The hypervisor must set the intercept for both read and write of the debug control register (DR7). With the intercepts in place, the #VC handler will be invoked when the guest accesses DR7. For a write to DR7, the #VC handler should perform Standard VMGExit processing. The #VC handler must not update the actual DR7 register, but rather it should cache the DR7 value being written. I bring this up because of the subtle dependency that checking SVM_SEV_FEAT_DEBUG_SWAP creates: set_dr_intercepts() needs to be called after sev_init_vmcb(). I believe this approach also fails to handle intrahost migration; at the very least, what exactly will happen when sev_migrate_from() invokes sev_init_vmcb() is unclear. And I really don't want to pile even more gunk on top of the existing mess. So, can you (and by "you" I really mean "the folks at AMD working on SEV stuff") start with the below diff (not intended to be a single patch), disallow kvm_arch_vcpu_ioctl_set_guest_debug() entirely for SEV-ES guests (will likely take some back and forth to figure out how we want to do this), and then fill in the blanks? I.e. get KVM to a state where all the intercept shenanigans for SEV and SEV-ES are reasonably contained in sev.c, and then enable the debug_swap stuff on top? diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index c25aeb550cd9..ff7a4d68731c 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2968,8 +2968,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm) svm_set_intercept(svm, TRAP_CR4_WRITE); svm_set_intercept(svm, TRAP_CR8_WRITE); - /* No support for enable_vmware_backdoor */ - clr_exception_intercept(svm, GP_VECTOR); + <debug register stuff goes here> /* Can't intercept XSETBV, HV can't modify XCR0 directly */ svm_clr_intercept(svm, INTERCEPT_XSETBV); @@ -2996,6 +2995,12 @@ void sev_init_vmcb(struct vcpu_svm *svm) svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE; clr_exception_intercept(svm, UD_VECTOR); + /* + * Don't intercept #GP for SEV guests, e.g. for the VMware backdoor, as + * KVM can't decrypt guest memory to decode the faulting instruction. + */ + clr_exception_intercept(svm, GP_VECTOR); + if (sev_es_guest(svm->vcpu.kvm)) sev_es_init_vmcb(svm); } diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e0ec95f1f068..89753d7fd821 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1209,10 +1209,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu) * Guest access to VMware backdoor ports could legitimately * trigger #GP because of TSS I/O permission bitmap. * We intercept those #GP and allow access to them anyway - * as VMware does. Don't intercept #GP for SEV guests as KVM can't - * decrypt guest memory to decode the faulting instruction. + * as VMware does. */ - if (enable_vmware_backdoor && !sev_guest(vcpu->kvm)) + if (enable_vmware_backdoor) set_exception_intercept(svm, GP_VECTOR); svm_set_intercept(svm, INTERCEPT_INTR); @@ -1950,7 +1949,7 @@ static void svm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); - if (vcpu->arch.guest_state_protected) + if (WARN_ON_ONCE(sev_es_guest(vcpu->kvm))) return; get_debugreg(vcpu->arch.db[0], 0); @@ -2681,7 +2680,7 @@ static int dr_interception(struct kvm_vcpu *vcpu) unsigned long val; int err = 0; - if (vcpu->guest_debug == 0) { + if (vcpu->guest_debug == 0 && !sev_es_guest(vcpu->kvm)) { /* * No more DR vmexits; force a reload of the debug registers * and reenter on this instruction. The next vmexit will diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index f44751dd8d5d..7c99a7d55476 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -409,23 +409,25 @@ static inline void set_dr_intercepts(struct vcpu_svm *svm) { struct vmcb *vmcb = svm->vmcb01.ptr; - if (!sev_es_guest(svm->vcpu.kvm)) { - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE); + if (sev_es_guest(svm->vcpu.kvm)) { + WARN_ON_ONCE(svm->vcpu.arch.last_vmentry_cpu != -1); + return; } + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE); vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ); vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE); @@ -436,13 +438,13 @@ static inline void clr_dr_intercepts(struct vcpu_svm *svm) { struct vmcb *vmcb = svm->vmcb01.ptr; + if (WARN_ON_ONCE(sev_es_guest(svm->vcpu.kvm))) + return; + vmcb->control.intercepts[INTERCEPT_DR] = 0; - /* DR7 access must remain intercepted for an SEV-ES guest */ - if (sev_es_guest(svm->vcpu.kvm)) { - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ); - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE); - } + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ); + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE); recalc_intercepts(svm); }