On Fri, Apr 19, 2024 at 3:48 PM James Houghton <jthoughton@xxxxxxxxxx> wrote: > > On Fri, Apr 19, 2024 at 2:07 PM David Matlack <dmatlack@xxxxxxxxxx> wrote: > > > > On 2024-04-19 01:47 PM, James Houghton wrote: > > > On Thu, Apr 11, 2024 at 10:28 AM David Matlack <dmatlack@xxxxxxxxxx> wrote: > > > > On 2024-04-11 10:08 AM, David Matlack wrote: > > > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > > > { > > > > bool young = false; > > > > > > > > if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps(kvm)) > > > > young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); > > > > > > > > if (tdp_mmu_enabled) > > > > young |= kvm_tdp_mmu_age_gfn_range(kvm, range); > > > > > > > > return young; > > > > } > > > > > > > > bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > > > { > > > > bool young = false; > > > > > > > > if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps(kvm)) > > > > young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap); > > > > > > > > if (tdp_mmu_enabled) > > > > young |= kvm_tdp_mmu_test_age_gfn(kvm, range); > > > > > > > > return young; > > > > > > > > > Yeah I think this is the right thing to do. Given your other > > > suggestions (on patch 3), I think this will look something like this > > > -- let me know if I've misunderstood something: > > > > > > bool check_rmap = !bitmap && kvm_memslot_have_rmaps(kvm); > > > > > > if (check_rmap) > > > KVM_MMU_LOCK(kvm); > > > > > > rcu_read_lock(); // perhaps only do this when we don't take the MMU lock? > > > > > > if (check_rmap) > > > kvm_handle_gfn_range(/* ... */ kvm_test_age_rmap) > > > > > > if (tdp_mmu_enabled) > > > kvm_tdp_mmu_test_age_gfn() // modified to be RCU-safe > > > > > > rcu_read_unlock(); > > > if (check_rmap) > > > KVM_MMU_UNLOCK(kvm); > > > > I was thinking a little different. If you follow my suggestion to first > > make the TDP MMU aging lockless, you'll end up with something like this > > prior to adding bitmap support (note: the comments are just for > > demonstrative purposes): > > > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > bool young = false; > > > > /* Shadow MMU aging holds write-lock. */ > > if (kvm_memslots_have_rmaps(kvm)) { > > write_lock(&kvm->mmu_lock); > > young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); > > write_unlock(&kvm->mmu_lock); > > } > > > > /* TDM MMU aging is lockless. */ > > if (tdp_mmu_enabled) > > young |= kvm_tdp_mmu_age_gfn_range(kvm, range); > > > > return young; > > } > > > > Then when you add bitmap support it would look something like this: > > > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > unsigned long *bitmap = range->arg.metadata->bitmap; > > bool young = false; > > > > /* SHadow MMU aging holds write-lock and does not support bitmap. */ > > if (kvm_memslots_have_rmaps(kvm) && !bitmap) { > > write_lock(&kvm->mmu_lock); > > young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); > > write_unlock(&kvm->mmu_lock); > > } > > > > /* TDM MMU aging is lockless and supports bitmap. */ > > if (tdp_mmu_enabled) > > young |= kvm_tdp_mmu_age_gfn_range(kvm, range); > > > > return young; > > } > > > > rcu_read_lock/unlock() would be called in kvm_tdp_mmu_age_gfn_range(). > > Oh yes this is a lot better. I hope I would have seen this when it > came time to actually update this patch. Thanks. > > > > > That brings up a question I've been wondering about. If KVM only > > advertises support for the bitmap lookaround when shadow roots are not > > allocated, does that mean MGLRU will be blind to accesses made by L2 > > when nested virtualization is enabled? And does that mean the Linux MM > > will think all L2 memory is cold (i.e. good candidate for swapping) > > because it isn't seeing accesses made by L2? > > Yes, I think so (for both questions). That's better than KVM not > participating in MGLRU aging at all, which is the case today (IIUC -- > also ignoring the case where KVM accesses guest memory directly). We > could have MGLRU always invoke the mmu notifiers, but frequently > taking the MMU lock for writing might be worse than evicting when we > shouldn't. Maybe Yu tried this at some point, but I can't find any > results for this. No, in this case only the fast path (page table scanning) is disabled. MGLRU still sees the A-bit from L2 using the rmap, i.e., the slow path calling folio_check_references().