Re: [PATCH] KVM: x86/mmu: Fix some return value error in kvm_tdp_mmu_map()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 30, 2021 at 9:03 AM Kai Huang <kai.huang@xxxxxxxxx> wrote:
>
> There are couple of issues in current tdp_mmu_map_handle_target_level()
> regarding to return value which reflects page fault handler's behavior
> -- whether it truely fixed page fault, or fault was suprious, or fault
> requires emulation, etc:
>
> 1) Currently tdp_mmu_map_handle_target_level() return 0, which is
>    RET_PF_RETRY, when page fault is actually fixed.  This makes
>    kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of
>    RET_PF_FIXED.

Ooof that was an oversight. Thank you for catching that.

>
> 2) When page fault is spurious, tdp_mmu_map_handle_target_level()
>    currently doesn't return immediately.  This is not correct, since it
>    may, for instance, lead to double emulation for a single instruction.

Could you please add an example of what would be required for this to
happen? What effect would it have?
I don't doubt you're correct on this point, just having a hard time
pinpointing where the issue is.

>
> 3) One case of spurious fault is missing: when iter->old_spte is not
>    REMOVED_SPTE, but still tdp_mmu_set_spte_atomic() fails on atomic
>    exchange. This case means the page fault has already been handled by
>    another thread, and RET_PF_SPURIOUS should be returned. Currently
>    this case is not distinguished with iter->old_spte == REMOVED_SPTE
>    case, and RET_PF_RETRY is returned.

See comment on this point in the code below.

>
> Fix 1) by initializing ret to RET_PF_FIXED at beginning. Fix 2) & 3) by
> explicitly adding is_removed_spte() check at beginning, and return
> RET_PF_RETRY when it is true.  For other two cases (old spte equals to
> new spte, and tdp_mmu_set_spte_atomic() fails), return RET_PF_SPURIOUS
> immediately.
>
> Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler")
> Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 84ee1a76a79d..a4dc7c9a4ebb 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -905,9 +905,12 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
>                                           kvm_pfn_t pfn, bool prefault)
>  {
>         u64 new_spte;
> -       int ret = 0;
> +       int ret = RET_PF_FIXED;
>         int make_spte_ret = 0;
>
> +       if (is_removed_spte(iter->old_spte))
> +               return RET_PF_RETRY;
> +
>         if (unlikely(is_noslot_pfn(pfn)))
>                 new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
>         else
> @@ -916,10 +919,9 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
>                                          map_writable, !shadow_accessed_mask,
>                                          &new_spte);
>
> -       if (new_spte == iter->old_spte)
> -               ret = RET_PF_SPURIOUS;
> -       else if (!tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
> -               return RET_PF_RETRY;
> +       if (new_spte == iter->old_spte ||
> +                       !tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
> +               return RET_PF_SPURIOUS;


I'm not sure this is quite right. In mmu_set_spte, I see the following comment:

/*
* The fault is fully spurious if and only if the new SPTE and old SPTE
* are identical, and emulation is not required.
*/

Based on that comment, I think the existing code is correct. Further,
if the cmpxchg fails, we have no guarantee that the value that was
inserted instead resolved the page fault. For example, if two threads
try to fault in a large page, but one access is a write and the other
an instruction fetch, the thread with the write might lose the race to
install its leaf SPTE to the instruction fetch thread installing a
non-leaf SPTE for NX hugepages. In that case the fault might not be
spurious and a retry could be needed.

We could do what the fast PF handler does and check
is_access_allowed(error_code, spte) with whatever value we lost the
cmpxchg to, but I don't know if that's worth doing or not. There's not
really much control flow difference between the two return values, as
far as I can tell.

It looks like we might also be incorrectly incrementing pf_fixed on a
spurious PF.

It might be more interesting and accurate to introduce a separate
return value for cmpxchg failures, just to see how often vCPUs
actually collide like that.

>
>         /*
>          * If the page fault was caused by a write but the page is write
> --
> 2.30.2
>



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux