On 5/16/21 9:50 PM, Jon Kohler wrote: > > >> On May 14, 2021, at 1:11 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote: >> >> On Fri, May 7, 2021 at 9:45 AM Jon Kohler <jon@xxxxxxxxxxx> wrote: >>> >>> kvm_load_host_xsave_state handles xsave on vm exit, part of which is >>> managing memory protection key state. The latest arch.pkru is updated >>> with a rdpkru, and if that doesn't match the base host_pkru (which >>> about 70% of the time), we issue a __write_pkru. >> >> This thread caused me to read the code, and I don't get how it's even >> remotely correct. >> >> First, kvm_load_guest_fpu() has this delight: >> >> /* >> * Guests with protected state can't have it set by the hypervisor, >> * so skip trying to set it. >> */ >> if (vcpu->arch.guest_fpu) >> /* PKRU is separately restored in kvm_x86_ops.run. */ >> __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state, >> ~XFEATURE_MASK_PKRU); >> >> That's nice, but it fails to restore XINUSE[PKRU]. As far as I know, >> that bit is live, and the only way to restore it to 0 is with >> XRSTOR(S). > > Thanks, Andy - Adding Tom Lendacky to this thread as that > Particular snippet was last touched in ~5.11 in ed02b213098a. It sounds like Andy's comment is separate from the changes associated with commit ed02b213098a, right? Commit ed02b213098a just added the check for vcpu->arch.guest_fpu to support SEV-ES guests. Since the hypervisor can't save/restore those registers directly when running under SEV-ES, we skip this. The state will be restored when VMRUN is performed. Thanks, Tom > >> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index cebdaa1e3cf5..cd95adbd140c 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -912,10 +912,10 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) >>> } >>> >>> if (static_cpu_has(X86_FEATURE_PKU) && >>> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || >>> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) && >>> - vcpu->arch.pkru != vcpu->arch.host_pkru) >>> - __write_pkru(vcpu->arch.pkru); >>> + vcpu->arch.pkru != vcpu->arch.host_pkru && >>> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || >>> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) >>> + __write_pkru(vcpu->arch.pkru, false); >> >> Please tell me I'm missing something (e.g. KVM very cleverly managing >> the PKRU register using intercepts) that makes this reliably load the >> guest value. An innocent or malicious guest could easily make that >> condition evaluate to false, thus allowing the host PKRU value to be >> live in guest mode. (Or is something fancy going on here?) >> >> I don't even want to think about what happens if a perf NMI hits and >> accesses host user memory while the guest PKRU is live (on VMX -- I >> think this can't happen on SVM). > > Perhaps Babu / Dave can comment on the exiting logic here? Babu did > some additional work to fix save/restore on 37486135d3a. > > From this changes perspective, I’m just trying to get to the resultant > evaluation a bit quicker, which this change (and the v3) seems to do > ok with; however, if I’ve ran my ship into a larger ice berg, happy to > collaborate to make it better overall. > >> >>> } >>> EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state); >>> >>> @@ -925,11 +925,11 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) >>> return; >>> >>> if (static_cpu_has(X86_FEATURE_PKU) && >>> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || >>> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) { >>> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || >>> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) { >>> vcpu->arch.pkru = rdpkru(); >>> if (vcpu->arch.pkru != vcpu->arch.host_pkru) >>> - __write_pkru(vcpu->arch.host_pkru); >>> + __write_pkru(vcpu->arch.host_pkru, true); >>> } >> >> Suppose the guest writes to PKRU and then, without exiting, sets PKE = >> 0 and XCR0[PKRU] = 0. (Or are the intercepts such that this can't >> happen except on SEV where maybe SEV magic makes the problem go away?) >> >> I admit I'm fairly mystified as to why KVM doesn't handle PKRU like >> the rest of guest XSTATE. > > I’ll defer to Dave/Babu here. This code has been this way for a bit now, > I’m not sure at first glance if that situation could occur intentionally > or accidentally, but that would evaluate the same both before and > after this change, no? >