> On May 14, 2021, at 1:11 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > On Fri, May 7, 2021 at 9:45 AM Jon Kohler <jon@xxxxxxxxxxx> wrote: >> >> kvm_load_host_xsave_state handles xsave on vm exit, part of which is >> managing memory protection key state. The latest arch.pkru is updated >> with a rdpkru, and if that doesn't match the base host_pkru (which >> about 70% of the time), we issue a __write_pkru. > > This thread caused me to read the code, and I don't get how it's even > remotely correct. > > First, kvm_load_guest_fpu() has this delight: > > /* > * Guests with protected state can't have it set by the hypervisor, > * so skip trying to set it. > */ > if (vcpu->arch.guest_fpu) > /* PKRU is separately restored in kvm_x86_ops.run. */ > __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state, > ~XFEATURE_MASK_PKRU); > > That's nice, but it fails to restore XINUSE[PKRU]. As far as I know, > that bit is live, and the only way to restore it to 0 is with > XRSTOR(S). Thanks, Andy - Adding Tom Lendacky to this thread as that Particular snippet was last touched in ~5.11 in ed02b213098a. > >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index cebdaa1e3cf5..cd95adbd140c 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -912,10 +912,10 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) >> } >> >> if (static_cpu_has(X86_FEATURE_PKU) && >> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || >> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) && >> - vcpu->arch.pkru != vcpu->arch.host_pkru) >> - __write_pkru(vcpu->arch.pkru); >> + vcpu->arch.pkru != vcpu->arch.host_pkru && >> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || >> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) >> + __write_pkru(vcpu->arch.pkru, false); > > Please tell me I'm missing something (e.g. KVM very cleverly managing > the PKRU register using intercepts) that makes this reliably load the > guest value. An innocent or malicious guest could easily make that > condition evaluate to false, thus allowing the host PKRU value to be > live in guest mode. (Or is something fancy going on here?) > > I don't even want to think about what happens if a perf NMI hits and > accesses host user memory while the guest PKRU is live (on VMX -- I > think this can't happen on SVM). Perhaps Babu / Dave can comment on the exiting logic here? Babu did some additional work to fix save/restore on 37486135d3a. >From this changes perspective, I’m just trying to get to the resultant evaluation a bit quicker, which this change (and the v3) seems to do ok with; however, if I’ve ran my ship into a larger ice berg, happy to collaborate to make it better overall. > >> } >> EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state); >> >> @@ -925,11 +925,11 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) >> return; >> >> if (static_cpu_has(X86_FEATURE_PKU) && >> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || >> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) { >> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || >> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) { >> vcpu->arch.pkru = rdpkru(); >> if (vcpu->arch.pkru != vcpu->arch.host_pkru) >> - __write_pkru(vcpu->arch.host_pkru); >> + __write_pkru(vcpu->arch.host_pkru, true); >> } > > Suppose the guest writes to PKRU and then, without exiting, sets PKE = > 0 and XCR0[PKRU] = 0. (Or are the intercepts such that this can't > happen except on SEV where maybe SEV magic makes the problem go away?) > > I admit I'm fairly mystified as to why KVM doesn't handle PKRU like > the rest of guest XSTATE. I’ll defer to Dave/Babu here. This code has been this way for a bit now, I’m not sure at first glance if that situation could occur intentionally or accidentally, but that would evaluate the same both before and after this change, no?