Re: [PATCH] KVM: X86: MMU: no mmu_notifier_seq++ in kvm_age_hva

Peter Feiner <pfeiner@xxxxxxxxxx> · Tue, 27 Sep 2016 08:31:55 -0700

On Tue, Sep 27, 2016 at 2:29 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
>
> On 26/09/2016 19:45, Peter Feiner wrote:
>> The MMU notifier sequence number keeps GPA->HPA mappings in sync when
>> GPA->HPA lookups are done outside of the MMU lock (e.g., in
>> tdp_page_fault). Since kvm_age_hva doesn't change GPA->HPA, it's
>> unnecessary to increment the sequence number.
>>
>> Signed-off-by: Peter Feiner <pfeiner@xxxxxxxxxx>
>
> Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>
> This needed some pondering, but it made sense in the end. :)  Do you
> actually see an effect or is it just a cleanup?

Thanks Paolo. This is just a cleanup. I wanted to run it by you to
make sure *I* understood the role of the notifier sequence number.

I'm reworking this function in a fix for nested bug exposed by PML.
When PML is on, EPT A/D is enabled in the vmcs02 EPTP regardless of
the vmcs12's EPTP value. The problem is that enabling A/D changes the
behavior of L2's x86 page table walks as seen by L1. With A/D enabled,
x86 page table walks are always treated as EPT writes. The EPT test in
kvm-unit-tests's x86/vmx.c shows this failure.

> Related to this, what do you think of this patch?
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 3d4cc8cc56a3..23683d5f6640 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1617,16 +1617,8 @@ static int kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
>         struct rmap_iterator iter;
>         int young = 0;
>
> -       /*
> -        * If there's no access bit in the secondary pte set by the
> -        * hardware it's up to gup-fast/gup to set the access bit in
> -        * the primary pte or in the page structure.
> -        */
> -       if (!shadow_accessed_mask)
> -               goto out;
> -
>         for_each_rmap_spte(rmap_head, &iter, sptep) {
> -               if (*sptep & shadow_accessed_mask) {
> +               if (!shadow_accessed_mask || (*sptep & shadow_accessed_mask)) {
>                         young = 1;
>                         break;
>                 }
>
> Since kvm_age_hva blows up "old" sPTEs, all PTEs that are there must be
> young...
>
> Paolo

Patch looks good. You can add "Reviewed-By: Peter Feiner <pfeiner@xxxxxxxxxx>"

>> ---
>>  arch/x86/kvm/mmu.c | 10 +---------
>>  1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 3d4cc8cc..dc6d1e8 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1660,17 +1660,9 @@ int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
>>        * This has some overhead, but not as much as the cost of swapping
>>        * out actively used pages or breaking up actively used hugepages.
>>        */
>> -     if (!shadow_accessed_mask) {
>> -             /*
>> -              * We are holding the kvm->mmu_lock, and we are blowing up
>> -              * shadow PTEs. MMU notifier consumers need to be kept at bay.
>> -              * This is correct as long as we don't decouple the mmu_lock
>> -              * protected regions (like invalidate_range_start|end does).
>> -              */
>> -             kvm->mmu_notifier_seq++;
>> +     if (!shadow_accessed_mask)
>>               return kvm_handle_hva_range(kvm, start, end, 0,
>>                                           kvm_unmap_rmapp);
>> -     }
>>
>>       return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp);
>>  }
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html