On Tue, Nov 30, 2021 at 5:01 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Tue, Nov 30, 2021, David Matlack wrote: > > On Fri, Nov 26, 2021 at 4:01 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > > > Hi, David, > > > > > > On Fri, Nov 19, 2021 at 11:57:56PM +0000, David Matlack wrote: > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > > index 2a7564703ea6..432a4df817ec 100644 > > > > --- a/arch/x86/include/asm/kvm_host.h > > > > +++ b/arch/x86/include/asm/kvm_host.h > > > > @@ -1232,6 +1232,9 @@ struct kvm_arch { > > > > hpa_t hv_root_tdp; > > > > spinlock_t hv_root_tdp_lock; > > > > #endif > > > > + > > > > + /* MMU caches used when splitting large pages during VM-ioctls. */ > > > > + struct kvm_mmu_memory_caches split_caches; > > > > > > Are mmu_gfn_array_cache and mmu_pte_list_desc_cache wasted here? I saw that > > > "struct kvm_mmu_memory_cache" still takes up quite a few hundreds of bytes, > > > just want to make sure we won't waste them in vain. > > > > Yes they are wasted right now. But there's a couple of things to keep in mind: > > > > 1. They are also wasted in every vCPU (in the per-vCPU caches) that > > does not use the shadow MMU. > > 2. They will (I think) be used eventually when I add Eager Page > > Splitting support to the shadow MMU. > > 3. split_caches is per-VM so it's only a few hundred bytes per VM. > > > > If we really want to save the memory the right way forward might be to > > make each kvm_mmu_memory_cache a pointer instead of an embedded > > struct. Then we can allocate each dynamically only as needed. I can > > add that to my TODO list but I don't think it'd be worth blocking this > > on it given the points above. > > > > > > > > [...] > > > > > > > +int mmu_topup_split_caches(struct kvm *kvm) > > > > +{ > > > > + struct kvm_mmu_memory_caches *split_caches = &kvm->arch.split_caches; > > > > + int r; > > > > + > > > > + assert_split_caches_invariants(kvm); > > > > + > > > > + r = kvm_mmu_topup_memory_cache(&split_caches->page_header_cache, 1); > > > > + if (r) > > > > + goto out; > > > > + > > > > + r = kvm_mmu_topup_memory_cache(&split_caches->shadow_page_cache, 1); > > > > + if (r) > > > > + goto out; > > > > > > Is it intended to only top-up with one cache object? IIUC this means we'll try > > > to proactively yield the cpu for each of the huge page split right after the > > > object is consumed. > > > > > > Wondering whether it be more efficient to make it a slightly larger number, so > > > we don't overload the memory but also make the loop a bit more efficient. > > > > IIUC, 1 here is just the min needed for kvm_mmu_topup_memory_cache to > > return success. I chose 1 for each because it's the minimum necessary > > to make forward progress (split one large page). > > The @min parameter is minimum number of pages that _must_ be available in the > cache, i.e. it's the maximum number of pages that can theoretically be used by > whatever upcoming operation is going to be consuming pages from the cache. > > So '1' is technically correct, but I think it's the wrong choice given the behavior > of this code. E.g. if there's 1 object in the cache, the initial top-up will do > nothing, This scenario will not happen though, since we free the caches after splitting. So, the next time userspace enables dirty logging on a memslot and we go to do the initial top-up the caches will have 0 objects. > and then tdp_mmu_split_large_pages_root() will almost immediately drop > mmu_lock to topup the cache. Since the in-loop usage explicitly checks for an > empty cache, i.e. any non-zero @min will have identical behavior, I think it makes > sense to use KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE _and_ add a comment explaining why. If we set the min to KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE, kvm_mmu_topup_memory_cache will return ENOMEM if it can't allocate at least KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE objects, even though we really only need 1 to make forward progress. It's a total edge case but there could be a scenario where userspace sets the cgroup memory limits so tight that we can't allocate KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE objects when splitting the last few pages and in the end we only needed 1 or 2 objects to finish splitting. In this case we'd end up with a spurious pr_warn and may not split the last few pages depending on which cache failed to get topped up. > > > No matter what you pass for min kvm_mmu_topup_memory_cache() will > > still always try to allocate KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE > > objects. > > No, it will try to allocate KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE if and only if there > are fewer than @min objects in the cache.