RE: [PATCH 1/2] KVM: Avoid illegal stage2 mapping on invalid memory slot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please ignore this email. It was sent as a mistake. 

> -----Original Message-----
> From: Ang, Tien Sung <tien.sung.ang@xxxxxxxxx>
> Sent: Friday, 14 July, 2023 11:33 AM
> To: Ang, Tien Sung <tien.sung.ang@xxxxxxxxx>
> Cc: Gavin Shan <gshan@xxxxxxxxxx>; stable@xxxxxxxxxxxxxxx; Shuai Hu
> <hshuai@xxxxxxxxxx>; Zhenyu Zhang <zhenyzha@xxxxxxxxxx>; David
> Hildenbrand <david@xxxxxxxxxx>; Oliver Upton <oliver.upton@xxxxxxxxx>;
> Peter Xu <peterx@xxxxxxxxxx>; Christopherson,, Sean
> <seanjc@xxxxxxxxxx>; Shaoqin Huang <shahuang@xxxxxxxxxx>; Paolo
> Bonzini <pbonzini@xxxxxxxxxx>
> Subject: [PATCH 1/2] KVM: Avoid illegal stage2 mapping on invalid memory
> slot
> 
> From: Gavin Shan <gshan@xxxxxxxxxx>
> 
> We run into guest hang in edk2 firmware when KSM is kept as running on the
> host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash device
> (TYPE_PFLASH_CFI01) during the operation of sector erasing or buffered
> write. The status is returned by reading the memory region of the pflash
> device and the read request should have been forwarded to QEMU and
> emulated by it. Unfortunately, the read request is covered by an illegal
> stage2 mapping when the guest hang issue occurs. The read request is
> completed with QEMU bypassed and wrong status is fetched. The edk2
> firmware runs into an infinite loop with the wrong status.
> 
> The illegal stage2 mapping is populated due to same page sharing by KSM at
> (C) even the associated memory slot has been marked as invalid at (B) when
> the memory slot is requested to be deleted. It's notable that the active and
> inactive memory slots can't be swapped when we're in the middle of
> kvm_mmu_notifier_change_pte() because kvm-
> >mn_active_invalidate_count is elevated, and
> kvm_swap_active_memslots() will busy loop until it reaches to zero again.
> Besides, the swapping from the active to the inactive memory slots is also
> avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
> corresponding to synchronize_srcu_expedited() in
> kvm_swap_active_memslots().
> 
>   CPU-A                    CPU-B
>   -----                    -----
>                            ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
>                            kvm_vm_ioctl_set_memory_region
>                            kvm_set_memory_region
>                            __kvm_set_memory_region
>                            kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
>                              kvm_invalidate_memslot
>                                kvm_copy_memslot
>                                kvm_replace_memslot
>                                kvm_swap_active_memslots        (A)
>                                kvm_arch_flush_shadow_memslot   (B)
>   same page sharing by KSM
>   kvm_mmu_notifier_invalidate_range_start
>         :
>   kvm_mmu_notifier_change_pte
>     kvm_handle_hva_range
>     __kvm_handle_hva_range
>     kvm_set_spte_gfn            (C)
>         :
>   kvm_mmu_notifier_invalidate_range_end
> 
> Fix the issue by skipping the invalid memory slot at (C) to avoid the illegal
> stage2 mapping so that the read request for the pflash's status is forwarded
> to QEMU and emulated by it. In this way, the correct pflash's status can be
> returned from QEMU to break the infinite loop in the edk2 firmware.
> 
> We tried a git-bisect and the first problematic commit is cd4c71835228 ("
> KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
> clean_dcache_guest_page() is called after the memory slots are iterated in
> kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
> before the iteration on the memory slots before this commit. This change
> literally enlarges the racy window between
> kvm_mmu_notifier_change_pte() and memory slot removal so that we're
> able to reproduce the issue in a practical test case. However, the issue exists
> since commit d5d8184d35c9
> ("KVM: ARM: Memory virtualization setup").
> 
> Cc: stable@xxxxxxxxxxxxxxx # v3.9+
> Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> Reported-by: Shuai Hu <hshuai@xxxxxxxxxx>
> Reported-by: Zhenyu Zhang <zhenyzha@xxxxxxxxxx>
> Signed-off-by: Gavin Shan <gshan@xxxxxxxxxx>
> Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>
> Reviewed-by: Oliver Upton <oliver.upton@xxxxxxxxx>
> Reviewed-by: Peter Xu <peterx@xxxxxxxxxx>
> Reviewed-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Reviewed-by: Shaoqin Huang <shahuang@xxxxxxxxxx>
> Message-Id: <20230615054259.14911-1-gshan@xxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
>  virt/kvm/kvm_main.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> 479802a892d4..65f94f592ff8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -686,6 +686,24 @@ static __always_inline int
> kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
> 
>  	return __kvm_handle_hva_range(kvm, &range);  }
> +
> +static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range
> +*range) {
> +	/*
> +	 * Skipping invalid memslots is correct if and only change_pte() is
> +	 * surrounded by invalidate_range_{start,end}(), which is currently
> +	 * guaranteed by the primary MMU.  If that ever changes, KVM
> needs to
> +	 * unmap the memslot instead of skipping the memslot to ensure
> that KVM
> +	 * doesn't hold references to the old PFN.
> +	 */
> +	WARN_ON_ONCE(!READ_ONCE(kvm-
> >mn_active_invalidate_count));
> +
> +	if (range->slot->flags & KVM_MEMSLOT_INVALID)
> +		return false;
> +
> +	return kvm_set_spte_gfn(kvm, range);
> +}
> +
>  static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
>  					struct mm_struct *mm,
>  					unsigned long address,
> @@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct
> mmu_notifier *mn,
>  	if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
>  		return;
> 
> -	kvm_handle_hva_range(mn, address, address + 1, pte,
> kvm_set_spte_gfn);
> +	kvm_handle_hva_range(mn, address, address + 1, pte,
> +kvm_change_spte_gfn);
>  }
> 
>  void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
> --
> 2.25.1





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux