On Tue, Jul 12, 2022, Peter Xu wrote: > On Tue, Jul 12, 2022 at 10:53:48PM +0000, Sean Christopherson wrote: > > On Tue, Jul 12, 2022, Peter Xu wrote: > > > On Fri, Jun 24, 2022 at 11:27:34PM +0000, Sean Christopherson wrote: > > > Sorry to start with asking questions, it's just that if we know that > > > pte_list_desc is probably not gonna be used then could we simply skip the > > > cache layer as a whole? IOW, we don't make the "array size of pte list > > > desc" dynamic, instead we make the whole "pte list desc cache layer" > > > dynamic. Is it possible? > > > > Not really? It's theoretically possible, but it'd require pre-checking that aren't > > aliases, and to do that race free we'd have to do it under mmu_lock, which means > > having to support bailing from the page fault to topup the cache. The memory > > overhead for the cache isn't so significant that it's worth that level of complexity. > > Ah, okay.. > > So the other question is I'm curious how fundamentally this extra > complexity could help us to save spaces. > > The thing is IIUC slub works in page sizes, so at least one slub cache eats > one page which is 4096 anyway. In our case if there was 40 objects > allocated for 14 entries array, are you sure it'll still be 40 objects but > only smaller? Definitely not 100% positive. > I'd thought after the change each obj is smaller but slub could have cached > more objects since min slub size is 4k for x86. > I don't remember the details of the eager split work on having per-vcpu The eager split logic uses a single per-VM cache, but it's large (513 entries). > caches, but I'm also wondering if we cannot drop the whole cache layer > whether we can selectively use slub in this case, then we can cache much > less assuming we will use just less too. > > Currently: > > r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache, > 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); > > We could have the pte list desc cache layer to be managed manually > (e.g. using kmalloc()?) for tdp=1, then we'll at least in control of how > many objects we cache? Then with a limited number of objects, the wasted > memory is much reduced too. I suspect that, without implementing something that looks an awful lot like the kmem caches, manually handling allocations would degrade performance for shadow paging and nested MMUs. > I think I'm fine with current approach too, but only if it really helps > reduce memory footprint as we expected. Yeah, I'll get numbers before sending v2 (which will be quite some time at this point).