Re: [PATCH] KVM: x86: add hint to skip hidden rdpkru under kvm_load_host_xsave_state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On May 14, 2021, at 1:11 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> 
> On Fri, May 7, 2021 at 9:45 AM Jon Kohler <jon@xxxxxxxxxxx> wrote:
>> 
>> kvm_load_host_xsave_state handles xsave on vm exit, part of which is
>> managing memory protection key state. The latest arch.pkru is updated
>> with a rdpkru, and if that doesn't match the base host_pkru (which
>> about 70% of the time), we issue a __write_pkru.
> 
> This thread caused me to read the code, and I don't get how it's even
> remotely correct.
> 
> First, kvm_load_guest_fpu() has this delight:
> 
>    /*
>     * Guests with protected state can't have it set by the hypervisor,
>     * so skip trying to set it.
>     */
>    if (vcpu->arch.guest_fpu)
>        /* PKRU is separately restored in kvm_x86_ops.run. */
>        __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
>                    ~XFEATURE_MASK_PKRU);
> 
> That's nice, but it fails to restore XINUSE[PKRU].  As far as I know,
> that bit is live, and the only way to restore it to 0 is with
> XRSTOR(S).

Thanks, Andy - Adding Tom Lendacky to this thread as that
Particular snippet was last touched in ~5.11 in ed02b213098a.

> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index cebdaa1e3cf5..cd95adbd140c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -912,10 +912,10 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
>>        }
>> 
>>        if (static_cpu_has(X86_FEATURE_PKU) &&
>> -           (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
>> -            (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) &&
>> -           vcpu->arch.pkru != vcpu->arch.host_pkru)
>> -               __write_pkru(vcpu->arch.pkru);
>> +           vcpu->arch.pkru != vcpu->arch.host_pkru &&
>> +           ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
>> +            kvm_read_cr4_bits(vcpu, X86_CR4_PKE)))
>> +               __write_pkru(vcpu->arch.pkru, false);
> 
> Please tell me I'm missing something (e.g. KVM very cleverly managing
> the PKRU register using intercepts) that makes this reliably load the
> guest value.  An innocent or malicious guest could easily make that
> condition evaluate to false, thus allowing the host PKRU value to be
> live in guest mode.  (Or is something fancy going on here?)
> 
> I don't even want to think about what happens if a perf NMI hits and
> accesses host user memory while the guest PKRU is live (on VMX -- I
> think this can't happen on SVM).

Perhaps Babu / Dave can comment on the exiting logic here? Babu did
some additional work to fix save/restore on 37486135d3a.

>From this changes perspective, I’m just trying to get to the resultant
evaluation a bit quicker, which this change (and the v3) seems to do
ok with; however, if I’ve ran my ship into a larger ice berg, happy to
collaborate to make it better overall.

> 
>> }
>> EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);
>> 
>> @@ -925,11 +925,11 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
>>                return;
>> 
>>        if (static_cpu_has(X86_FEATURE_PKU) &&
>> -           (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
>> -            (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) {
>> +           ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
>> +            kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) {
>>                vcpu->arch.pkru = rdpkru();
>>                if (vcpu->arch.pkru != vcpu->arch.host_pkru)
>> -                       __write_pkru(vcpu->arch.host_pkru);
>> +                       __write_pkru(vcpu->arch.host_pkru, true);
>>        }
> 
> Suppose the guest writes to PKRU and then, without exiting, sets PKE =
> 0 and XCR0[PKRU] = 0.  (Or are the intercepts such that this can't
> happen except on SEV where maybe SEV magic makes the problem go away?)
> 
> I admit I'm fairly mystified as to why KVM doesn't handle PKRU like
> the rest of guest XSTATE.

I’ll defer to Dave/Babu here. This code has been this way for a bit now,
I’m not sure at first glance if that situation could occur intentionally
or accidentally, but that would evaluate the same both before and
after this change, no?





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux