On 2024-04-11 10:08 AM, David Matlack wrote: > On 2024-04-01 11:29 PM, James Houghton wrote: > > Only handle the TDP MMU case for now. In other cases, if a bitmap was > > not provided, fallback to the slowpath that takes mmu_lock, or, if a > > bitmap was provided, inform the caller that the bitmap is unreliable. > > > > Suggested-by: Yu Zhao <yuzhao@xxxxxxxxxx> > > Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx> > > --- > > arch/x86/include/asm/kvm_host.h | 14 ++++++++++++++ > > arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++-- > > arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++++++- > > 3 files changed, 37 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 3b58e2306621..c30918d0887e 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -2324,4 +2324,18 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); > > */ > > #define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1) > > > > +#define kvm_arch_prepare_bitmap_age kvm_arch_prepare_bitmap_age > > +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn) > > +{ > > + /* > > + * Indicate that we support bitmap-based aging when using the TDP MMU > > + * and the accessed bit is available in the TDP page tables. > > + * > > + * We have no other preparatory work to do here, so we do not need to > > + * redefine kvm_arch_finish_bitmap_age(). > > + */ > > + return IS_ENABLED(CONFIG_X86_64) && tdp_mmu_enabled > > + && shadow_accessed_mask; > > +} > > + > > #endif /* _ASM_X86_KVM_HOST_H */ > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 992e651540e8..fae1a75750bb 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -1674,8 +1674,14 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > bool young = false; > > > > - if (kvm_memslots_have_rmaps(kvm)) > > + if (kvm_memslots_have_rmaps(kvm)) { > > + if (range->lockless) { > > + kvm_age_set_unreliable(range); > > + return false; > > + } > > If a VM has TDP MMU enabled, supports A/D bits, and is using nested > virtualization, MGLRU will effectively be blind to all accesses made by > the VM. > > kvm_arch_prepare_bitmap_age() will return true indicating that the > bitmap is supported. But then kvm_age_gfn() and kvm_test_age_gfn() will > return false immediately and indicate the bitmap is unreliable because a > shadow root is allocate. The notfier will then return > MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. > > Looking at the callers, MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE is never > consumed or used. So I think MGLRU will assume all memory is > unaccessed? > > One way to improve the situation would be to re-order the TDP MMU > function first and return young instead of false, so that way MGLRU at > least has visibility into accesses made by L1 (and L2 if EPT is disable > in L2). But that still means MGLRU is blind to accesses made by L2. > > What about grabbing the mmu_lock if there's a shadow root allocated and > get rid of MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE altogether? > > if (kvm_memslots_have_rmaps(kvm)) { > write_lock(&kvm->mmu_lock); > young |= kvm_handle_gfn_range(kvm, range, kvm_age_rmap); > write_unlock(&kvm->mmu_lock); > } > > The TDP MMU walk would still be lockless. KVM only has to take the > mmu_lock to collect accesses made by L2. > > kvm_age_rmap() and kvm_test_age_rmap() will need to become bitmap-aware > as well, but that seems relatively simple with the helper functions. Wait, even simpler, just check kvm_memslots_have_rmaps() in kvm_arch_prepare_bitmap_age() and skip the shadow MMU when processing a bitmap request. i.e. static inline bool kvm_arch_prepare_bitmap_age(struct kvm *kvm, struct mmu_notifier *mn) { /* * Indicate that we support bitmap-based aging when using the TDP MMU * and the accessed bit is available in the TDP page tables. * * We have no other preparatory work to do here, so we do not need to * redefine kvm_arch_finish_bitmap_age(). */ return IS_ENABLED(CONFIG_X86_64) && tdp_mmu_enabled && shadow_accessed_mask && !kvm_memslots_have_rmaps(kvm); } bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps(kvm)) young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); return young; } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps(kvm)) young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap); if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); return young; } Sure this could race with the creation of a shadow root but so can the non-bitmap code.