Re: [PATCH v1 1/2] KVM: nVMX: get rid of nested_get_page()

Jim Mattson <jmattson@xxxxxxxxxx> · Thu, 3 Aug 2017 09:05:50 -0700

On Thu, Aug 3, 2017 at 8:36 AM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
> 2017-08-03 16:09+0200, David Hildenbrand:
>> nested_get_page() just sounds confusing. All we want is a page from G1.
>> This is even unrelated to nested.
>>
>> Let's introduce kvm_vcpu_gpa_to_page() so we don't get too lengthy
>> lines.
>>
>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
>> ---
>
> I like the cleanup, but a subtle change in behavior that makes me wary:
>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -9535,15 +9528,15 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
>>                */
>>               if (vmx->nested.apic_access_page) /* shouldn't happen */
>>                       nested_release_page(vmx->nested.apic_access_page);
>> -             vmx->nested.apic_access_page =
>> -                     nested_get_page(vcpu, vmcs12->apic_access_addr);
>> +             page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);
>
> If what shouldn't happen happened and then kvm_vcpu_gpa_to_page()
> failed, we'd be calling put_page() twice on the same page.  I think the
> situation currently can happen if VM entry fails after this point.
>
> Assigning 'vmx->nested.apic_access_page = NULL' when releasing the page
> sounds safer.
>
> Unless I'm reading something wrong, the "shouldn't happen" really
> shouldn't happen if we did something like this:
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index e34373838b31..d26e6693f748 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10462,8 +10462,6 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>                 return 1;
>         }
>
> -       nested_get_vmcs12_pages(vcpu, vmcs12);
> -
>         msr_entry_idx = nested_vmx_load_msr(vcpu,
>                                             vmcs12->vm_entry_msr_load_addr,
>                                             vmcs12->vm_entry_msr_load_count);
> @@ -10475,6 +10473,8 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>                 return 1;
>         }
>
> +       nested_get_vmcs12_pages(vcpu, vmcs12);
> +
>         /*
>          * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
>          * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
>
>>               /*
>>                * If translation failed, no matter: This feature asks
>>                * to exit when accessing the given address, and if it
>>                * can never be accessed, this feature won't do
>>                * anything anyway.
>>                */

This comment is incorrect. On real hardware, the APIC access page
doesn't have to exist (i.e. be backed by actual memory), because the
APIC access page is never accessed. Think of the APIC access page as a
sentinel value that the hypervisor can put in the page tables (EPT
page tables if they are in use, x86 page tables otherwise) to trigger
APIC virtualization. If there is an access, it is to the page at the
virtual APIC address, not the APIC access page.

Similarly, in a VM, there need not be a mapping for the APIC access
page for the feature to work as architected. (Or, at least, that's the
way it should work. :-)

>> -             if (vmx->nested.apic_access_page) {
>> +             if (!is_error_page(page)) {
>> +                     vmx->nested.apic_access_page = page;
>>                       hpa = page_to_phys(vmx->nested.apic_access_page);
>>                       vmcs_write64(APIC_ACCESS_ADDR, hpa);
>>               } else {
>> @@ -9560,8 +9553,7 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
>>       if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
>>               if (vmx->nested.virtual_apic_page) /* shouldn't happen */
>>                       nested_release_page(vmx->nested.virtual_apic_page);
>
> Ditto,
>
> thanks.
>
>> -             vmx->nested.virtual_apic_page =
>> -                     nested_get_page(vcpu, vmcs12->virtual_apic_page_addr);
>> +             page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);
>>
>>               /*
>>                * If translation failed, VM entry will fail because
>