Hi Yu, On Fri, May 26, 2023 at 05:44:30PM -0600, Yu Zhao wrote: > Implement kvm_arch_test_clear_young() to support the fast path in > mmu_notifier_ops->test_clear_young(). > > It focuses on a simple case, i.e., hardware sets the accessed bit in > KVM PTEs and VMs are not protected, where it can rely on RCU and > cmpxchg to safely clear the accessed bit without taking > kvm->mmu_lock. Complex cases fall back to the existing slow path > where kvm->mmu_lock is then taken. > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > --- > arch/arm64/include/asm/kvm_host.h | 6 ++++++ > arch/arm64/kvm/mmu.c | 36 +++++++++++++++++++++++++++++++ > 2 files changed, 42 insertions(+) > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index 7e7e19ef6993..da32b0890716 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -1113,4 +1113,10 @@ static inline void kvm_hyp_reserve(void) { } > void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu); > bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu); > > +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young > +static inline bool kvm_arch_has_test_clear_young(void) > +{ > + return cpu_has_hw_af() && !is_protected_kvm_enabled(); > +} I would *strongly* suggest you consider supporting test_clear_young on systems that do software Access Flag management. FEAT_HAFDBS is an *optional* extension to the architecture, so we're going to support software AF management for a very long time in KVM. It is also a valid fallback option in the case of hardware errata which render HAFDBS broken. So, we should expect (and support) systems of all shapes and sizes that do software AF. I'm sure we'll hear about more in the not-too-distant future... For future reference (even though I'm suggesting you support software AF), decisions such of these need an extremely verbose comment describing the rationale behind the decision. > + > #endif /* __ARM64_KVM_HOST_H__ */ > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index c3b3e2afe26f..26a8d955b49c 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c Please do not implement page table walkers outside of hyp/pgtable.c > @@ -1678,6 +1678,42 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > range->start << PAGE_SHIFT); > } > > +static int stage2_test_clear_young(const struct kvm_pgtable_visit_ctx *ctx, > + enum kvm_pgtable_walk_flags flags) > +{ > + kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF; > + > + VM_WARN_ON_ONCE(!page_count(virt_to_page(ctx->ptep))); This sort of sanity checking is a bit excessive. Isn't there a risk of false negatives here too? IOW, if we tragically mess up RCU in the page table code, what's stopping a prematurely freed page from being allocated to another user? > + if (!kvm_pte_valid(new)) > + return 0; > + > + if (new == ctx->old) > + return 0; > + > + if (kvm_should_clear_young(ctx->arg, ctx->addr / PAGE_SIZE)) > + stage2_try_set_pte(ctx, new); > + > + return 0; > +} > + > +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range) > +{ > + u64 start = range->start * PAGE_SIZE; > + u64 end = range->end * PAGE_SIZE; > + struct kvm_pgtable_walker walker = { > + .cb = stage2_test_clear_young, > + .arg = range, > + .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_SHARED, > + }; > + > + BUILD_BUG_ON(is_hyp_code()); Delete this assertion. > + kvm_pgtable_walk(kvm->arch.mmu.pgt, start, end - start, &walker); > + > + return false; > +} > + > phys_addr_t kvm_mmu_get_httbr(void) > { > return __pa(hyp_pgtable->pgd); > -- > 2.41.0.rc0.172.g3f132b7071-goog > -- Thanks, Oliver