Re: Potential bug in TDP MMU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 11, 2021 at 2:39 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Fri, Dec 10, 2021, Ignat Korchagin wrote:
> > I've been trying to figure out the difference between "good" runs and
> > "bad" runs of gvisor. So, if I've been running the following bpftrace
> > onliner:
>
> ...
>
> > That is, I never get a stack with
> > kvm_tdp_mmu_put_root->..->kvm_set_pfn_dirty with a "good" run.
> > Perhaps, this may shed some light onto what is going on.
>
> Hmm, a little?
>
> Based on the WARN backtrace, KVM encounters an entire chain of valid, present TDP
> MMU paging structures _after_ exit_mm() in the do_exit() path, as the call to
> task_work_run() in do_exit() occurs after exit_mm().
>
> That means that kvm_mmu_zap_all() is guaranteed to have been called before the
> fatal kvm_arch_destroy_vm(), as either:
>
>   a) exit_mm() put the last reference to mm_users and thus called __mmput ->
>      exit_mmap() -> mmu_notifier_release() -> ... -> kvm_mmu_zap_all().
>
>   b) Something else had a reference to mm_users, and so KVM's ->release hook was
>      invoked by kvm_destroy_vm() -> mmu_notifier_unregister().
>
> It's probably fairly safe to assume this is a TDP MMU bug, which rules out races
> or bad refcounts in other areas.

Most likely. Currently we're using kvm.tdp_mmu=0 kernel cmdline as a
workaround and haven't encountered any issues.

> That means that KVM (a) is somehow losing track of a root, (b) isn't zapping all
> SPTEs in kvm_mmu_zap_all(), or (c) is installing a SPTE after the mm has been released.
>
> (a) is unlikely because kvm_tdp_mmu_get_vcpu_root_hpa() is the only way for a
> vCPU to get a reference, and it holds mmu_lock for write, doesn't yield, and
> either gets a root from the list or adds a root to the list.
>
> (b) is unlikely because I would expect the fallout to be much larger and not
> unique to your setup.
>
> That leaves (c), which isn't all that likely either.  I can think of a variety of
> ways KVM might write a defunct SPTE, but I can't concoct a scenario where an
> entire tree of a present paging structures is written.
>
> Can you run with the below debug patch and see if you get a hit in the failure
> scenario?  Or possibly even a non-failure scenario?  This should either confirm
> or rule out (c).
>
>
> ---
>  arch/x86/kvm/mmu/mmu.c     | 2 ++
>  arch/x86/kvm/mmu/tdp_mmu.c | 5 +++++
>  include/linux/kvm_host.h   | 2 ++
>  3 files changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 1ccee4d17481..e4e283a38570 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5939,6 +5939,8 @@ void kvm_mmu_zap_all(struct kvm *kvm)
>         LIST_HEAD(invalid_list);
>         int ign;
>
> +       atomic_set(&kvm->mm_released, 1);
> +
>         write_lock(&kvm->mmu_lock);
>  restart:
>         list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index b69e47e68307..432ccf05f446 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -504,6 +504,9 @@ static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
>  {
>         lockdep_assert_held_read(&kvm->mmu_lock);
>
> +       WARN_ON(atomic_read(&kvm->mm_released) &&
> +               new_spte && !is_removed_spte(new_spte));
> +
>         /*
>          * Do not change removed SPTEs. Only the thread that froze the SPTE
>          * may modify it.
> @@ -577,6 +580,8 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
>  {
>         lockdep_assert_held_write(&kvm->mmu_lock);
>
> +       WARN_ON(atomic_read(&kvm->mm_released) && new_spte);
> +
>         /*
>          * No thread should be using this function to set SPTEs to the
>          * temporary removed SPTE value.
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e7bfcc3b6b0b..8e76e2f6c3be 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -569,6 +569,8 @@ struct kvm {
>
>         struct mutex slots_lock;
>
> +       atomic_t mm_released;
> +
>         /*
>          * Protects the arch-specific fields of struct kvm_memory_slots in
>          * use by the VM. To be used under the slots_lock (above) or in a
>
> base-commit: 1c10f4b4877ffaed602d12ff8cbbd5009e82c970
> --

Thanks. Applied the patch, but no warnings are triggered neither in
"good" case nor in "bad" case.

Ignat



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux