On Wed, May 18, 2022 at 8:24 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Wed, May 18, 2022, Peter Xu wrote: > > On Tue, May 17, 2022 at 04:20:31PM -0400, Peter Xu wrote: > > > On Tue, May 17, 2022 at 07:05:24PM +0000, David Matlack wrote: > > > > +uint64_t perf_test_nested_pages(int nr_vcpus) > > > > +{ > > > > + /* > > > > + * 513 page tables to identity-map the L2 with 1G pages, plus a few > > > > + * pages per-vCPU for data structures such as the VMCS. > > > > + */ > > > > + return 513 + 10 * nr_vcpus; > > > > > > Shouldn't that 513 magic value be related to vm->max_gfn instead (rather > > > than assuming all hosts have 39 bits PA)? > > > > > > If my math is correct, it'll require 1GB here just for the l2->l1 pgtables > > > on a 5-level host to run this test nested. So I had a feeling we'd better > > > still consider >4 level hosts some day very soon.. No strong opinion, as > > > long as this test is not run by default. > > > > I had a feeling that when I said N level I actually meant N-1 level in all > > above, since 39 bits are for 3 level not 4 level?.. > > > > Then it's ~512GB pgtables on 5 level? If so I do think we'd better have a > > nicer way to do this identity mapping.. > > Agreed, mapping all theoretically possible gfns into L2 is doomed to fail for > larger MAXPHYADDR systems. Peter, I think your original math was correct. For 4-level we need 1 L4 + 512 L3 tables (i.e. ~2MiB) to map the entire address space. Each of the L3 tables contains 512 PTEs that each points to a 1GiB page, mapping in total 512 * 512 = 256 TiBd. So for 5-level we need 1 L5 + 512 L4 + 262144 L3 table (i.e. ~1GiB). > > Page table allocations are currently hardcoded to come from memslot0. memslot0 > is required to be in lower DRAM, and thus tops out at ~3gb for all intents and > purposes because we need to leave room for the xAPIC. > > And I would strongly prefer not to plumb back the ability to specificy an alternative > memslot for page table allocations, because except for truly pathological tests that > functionality is unnecessary and pointless complexity. > > > I don't think it's very hard - walk the mem regions in kvm_vm.regions > > should work for us? > > Yeah. Alternatively, The test can identity map all of memory <4gb and then also > map "guest_test_phys_mem - guest_num_pages". I don't think there's any other memory > to deal with, is there? This isn't necessary for 4-level, but also wouldn't be too hard to implement. I can take a stab at implementing in v3 if we think 5-level selftests are coming soon.