On Wed, May 31, 2023 at 1:56 PM Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > Hi Yu, > > On Fri, May 26, 2023 at 05:44:30PM -0600, Yu Zhao wrote: > > Implement kvm_arch_test_clear_young() to support the fast path in > > mmu_notifier_ops->test_clear_young(). > > > > It focuses on a simple case, i.e., hardware sets the accessed bit in > > KVM PTEs and VMs are not protected, where it can rely on RCU and > > cmpxchg to safely clear the accessed bit without taking > > kvm->mmu_lock. Complex cases fall back to the existing slow path > > where kvm->mmu_lock is then taken. > > > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > > --- > > arch/arm64/include/asm/kvm_host.h | 6 ++++++ > > arch/arm64/kvm/mmu.c | 36 +++++++++++++++++++++++++++++++ > > 2 files changed, 42 insertions(+) > > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > > index 7e7e19ef6993..da32b0890716 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -1113,4 +1113,10 @@ static inline void kvm_hyp_reserve(void) { } > > void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu); > > bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu); > > > > +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young > > +static inline bool kvm_arch_has_test_clear_young(void) > > +{ > > + return cpu_has_hw_af() && !is_protected_kvm_enabled(); > > +} > > I would *strongly* suggest you consider supporting test_clear_young on > systems that do software Access Flag management. FEAT_HAFDBS is an > *optional* extension to the architecture, so we're going to support > software AF management for a very long time in KVM. It is also a valid > fallback option in the case of hardware errata which render HAFDBS > broken. Hi Oliver, It's not about willingness but resources. Ideally we want to make everything perfect, but in reality, we can only move forward one step a time. If I looked at your request from ARM's POV, I would agree with you. But my goal is to lay the foundation for all architectures that could benefit, so I may not be able to cover a lot for each architecture. Specifically, I don't have the bandwidth to test the !FEAT_HAFDBS case for ARM. So here are some options I could offer, ordered by my preferences: 1. We proceed as it is for now. I *will* find someone from my team (or yours) to follow up -- this way we can make sure !FEAT_HAFDBS is well tested. 2. I drop the cpu_has_hw_af() check above. Not that I think there is much risk, I'm just trying to be cautious. 3. I drop the entire ARM support (and include the RISC-V support which I previously deprioritized). We revisit after the test is done. Sounds reasonable? > So, we should expect (and support) systems of all shapes and sizes that > do software AF. I'm sure we'll hear about more in the not-too-distant > future... > > For future reference (even though I'm suggesting you support software > AF), decisions such of these need an extremely verbose comment > describing the rationale behind the decision. > > > + > > #endif /* __ARM64_KVM_HOST_H__ */ > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index c3b3e2afe26f..26a8d955b49c 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > Please do not implement page table walkers outside of hyp/pgtable.c > > > @@ -1678,6 +1678,42 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > range->start << PAGE_SHIFT); > > } > > > > +static int stage2_test_clear_young(const struct kvm_pgtable_visit_ctx *ctx, > > + enum kvm_pgtable_walk_flags flags) > > +{ > > + kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF; > > + > > + VM_WARN_ON_ONCE(!page_count(virt_to_page(ctx->ptep))); > > This sort of sanity checking is a bit excessive. Isn't there a risk of > false negatives here too? IOW, if we tragically mess up RCU in the page > table code, what's stopping a prematurely freed page from being > allocated to another user? Yes, but from my aforementioned POV (the breadth I'm focusing on), this is a good practice. I can live without this assertion if you feel strongly about it. > > + if (!kvm_pte_valid(new)) > > + return 0; > > + > > + if (new == ctx->old) > > + return 0; > > + > > + if (kvm_should_clear_young(ctx->arg, ctx->addr / PAGE_SIZE)) > > + stage2_try_set_pte(ctx, new); > > + > > + return 0; > > +} > > + > > +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range) > > +{ > > + u64 start = range->start * PAGE_SIZE; > > + u64 end = range->end * PAGE_SIZE; > > + struct kvm_pgtable_walker walker = { > > + .cb = stage2_test_clear_young, > > + .arg = range, > > + .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_SHARED, > > + }; > > + > > + BUILD_BUG_ON(is_hyp_code()); > > Delete this assertion. Will do.