On Mon, Aug 19, 2024 at 1:42 PM Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > On Fri, Aug 16, 2024 at 07:03:27PM -0600, Yu Zhao wrote: > > On Fri, Aug 16, 2024 at 6:46 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > [...] > > > > Were you expecting vCPU runtime to improve (more)? If so, lack of movement could > > > be due to KVM arm64 taking mmap_lock for read when handling faults: I had no real expectation. I was hoping that maybe there could be a vCPU runtime improvement, given that user_mem_abort() (being called because we're faulting memory in continuously in this test) has to take the KVM MMU lock for reading, and aging is taking it for reading vs. writing. I think that's why aging is a lot slower when using the write lock: it is waiting for the readers to drop the lock, but I guess the delay on the *readers* due to the pending writer seems to be pretty minimal. > > > > > > https://lore.kernel.org/all/Zr0ZbPQHVNzmvwa6@xxxxxxxxxx > > > > For the above test, I don't think it's mmap_lock > > Yeah, I don't think this is related to the mmap_lock. > > James is likely using hardware that has FEAT_HAFDBS, so vCPUs won't > fault for an Access flag update. Even if he's on a machine w/o it, > Access flag faults are handled outside the mmap_lock. Yeah I was running on Ampere Altra CPUs. > Forcing SW management of the AF at stage-2 would be the best case for > demonstrating the locking improvement: > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index a24a2a857456..a640e8a8c6ea 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -669,8 +669,6 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift) > * happen to be running on a design that has unadvertised support for > * HAFDBS. Here be dragons. > */ > - if (!cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_38)) > - vtcr |= VTCR_EL2_HA; > #endif /* CONFIG_ARM64_HW_AFDBM */ > > if (kvm_lpa2_is_enabled()) Thanks! > Changing the config option would work too, but I wasn't sure if > FEAT_HAFDBS on the primary MMU influenced MGLRU heuristics. Indeed, disabling CONFIG_ARM64_HW_AFDBM will cause MGLRU not to do aging. > > -- the reclaim path, > > e.g., when zswapping guest memory, has two stages: aging (scanning > > PTEs) and eviction (unmapping PTEs). Only testing the former isn't > > realistic at all. > > AIUI, the intention of this test data is to provide some justification > for why Marc + I should consider the locking change *outside* of any > MMU notifier changes. So from that POV, this is meant as a hacked > up microbenchmark and not meant to be realistic. > > And really, the arm64 change has nothing to do with this series at > this point, which is disappointing. In the interest of moving this > feature along for both architectures, would you be able help James > with: > > - Identifying a benchmark that you believe is realistic > > - Suggestions on how to run that benchmark on Google infrastructure > > Asking since you had a setup / data earlier on when you were carrying > the series. Hopefully with supportive data we can get arm64 to opt-in > to HAVE_KVM_MMU_NOTIFIER_YOUNG_FAST_ONLY as well. I'll keep trying some other approaches I can take for getting similar testing that Yu had; it is somewhat difficult for me to reproduce those tests (and it really shouldn't be.... sorry). I think it makes most sense for me to drop the arm64 patch for now and re-propose it (or something stronger) alongside enabling aging. Does that sound ok?