On Fri, Aug 16, 2024 at 07:03:27PM -0600, Yu Zhao wrote: > On Fri, Aug 16, 2024 at 6:46 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: [...] > > Were you expecting vCPU runtime to improve (more)? If so, lack of movement could > > be due to KVM arm64 taking mmap_lock for read when handling faults: > > > > https://lore.kernel.org/all/Zr0ZbPQHVNzmvwa6@xxxxxxxxxx > > For the above test, I don't think it's mmap_lock Yeah, I don't think this is related to the mmap_lock. James is likely using hardware that has FEAT_HAFDBS, so vCPUs won't fault for an Access flag update. Even if he's on a machine w/o it, Access flag faults are handled outside the mmap_lock. Forcing SW management of the AF at stage-2 would be the best case for demonstrating the locking improvement: diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index a24a2a857456..a640e8a8c6ea 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -669,8 +669,6 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift) * happen to be running on a design that has unadvertised support for * HAFDBS. Here be dragons. */ - if (!cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_38)) - vtcr |= VTCR_EL2_HA; #endif /* CONFIG_ARM64_HW_AFDBM */ if (kvm_lpa2_is_enabled()) Changing the config option would work too, but I wasn't sure if FEAT_HAFDBS on the primary MMU influenced MGLRU heuristics. > -- the reclaim path, > e.g., when zswapping guest memory, has two stages: aging (scanning > PTEs) and eviction (unmapping PTEs). Only testing the former isn't > realistic at all. AIUI, the intention of this test data is to provide some justification for why Marc + I should consider the locking change *outside* of any MMU notifier changes. So from that POV, this is meant as a hacked up microbenchmark and not meant to be realistic. And really, the arm64 change has nothing to do with this series at this point, which is disappointing. In the interest of moving this feature along for both architectures, would you be able help James with: - Identifying a benchmark that you believe is realistic - Suggestions on how to run that benchmark on Google infrastructure Asking since you had a setup / data earlier on when you were carrying the series. Hopefully with supportive data we can get arm64 to opt-in to HAVE_KVM_MMU_NOTIFIER_YOUNG_FAST_ONLY as well. -- Thanks, Oliver