Re: [PATCH] KVM: selftests: Create memslot 0 at GPA 0x100000000 on x86_64

David Matlack <dmatlack@xxxxxxxxxx> · Fri, 15 Mar 2024 09:16:49 -0700

On Thu, Mar 14, 2024 at 4:14 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Mar 14, 2024, David Matlack wrote:
> > On Wed, Mar 13, 2024 at 7:31 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Mar 07, 2024, David Matlack wrote:
> > > > On Thu, Mar 7, 2024 at 3:27 PM David Matlack <dmatlack@xxxxxxxxxx> wrote:
> > > > >
> > > > > On 2024-03-07 02:37 PM, Sean Christopherson wrote:
> > > > > > On Thu, Mar 07, 2024, David Matlack wrote:
> > > > > > > Create memslot 0 at 0x100000000 (4GiB) to avoid it overlapping with
> > > > > > > KVM's private memslot for the APIC-access page.
> > > > > >
> > > > > > Any chance we can solve this by using huge pages in the guest, and adjusting the
> > > > > > gorilla math in vm_nr_pages_required() accordingly?  There's really no reason to
> > > > > > use 4KiB pages for a VM with 256GiB of memory.  That'd also be more represantitive
> > > > > > of real world workloads (at least, I hope real world workloads are using 2MiB or
> > > > > > 1GiB pages in this case).
> > > > >
> > > > > There are real world workloads that use TiB of RAM with 4KiB mappings
> > > > > (looking at you SAP HANA).
> > > > >
> > > > > What about giving tests an explicit "start" GPA they can use? That would
> > > > > fix max_guest_memory_test and avoid tests making assumptions about 4GiB
> > > > > being a magically safe address to use.
> > >
> > > So, rather than more hardcoded addresses and/or a knob to control _all_ code
> > > allocations, I think we should provide knob to say that MEM_REGION_PT should go
> > > to memory above 4GiB. And to make memslot handling maintainable in the long term:
> > >
> > >   1. Add a knob to place MEM_REGION_PT at 4GiB (and as of this initial patch,
> > >      conditionally in their own memslot).
> > >
> > >   2. Use the PT_AT_4GIB (not the real name) knob for the various memstress tests
> > >      that need it.
> >
> > Making tests pick when to place page tables at 4GiB seems unnecessary.
> > Tests that don't otherwise need a specific physical memory layout
> > should be able to create a VM with any amount of memory and have it
> > just work.
> >
> > It's also not impossible that a test has 4GiB+ .bss because the guest
> > needs a big array for something. In that case we'd need a knob to move
> > MEM_REGION_CODE above 4GiB on x86_64 as well.
>
> LOL, at that point, the test can darn well dynamically allocate its memory.
> Though I'd love to see a test that needs a 3GiB array :-)
>
> > For x86_64 (which is the only architecture AFAIK that has a private
> > memslot in KVM the framework can overlap with), what's the downside of
> > always putting all memslots above 4GiB?
>
> Divergence from other architectures, divergence from "real" VM configurations,
> and a more compliciated programming environment for the vast majority of tests.
> E.g. a test that uses more than ~3GiB of memory would need to dynamically place
> its test specific memslots, whereas if the core library keeps everything under
> 4GiB by default, then on x86 every test knows it has 4GiB+ to play with.

Divergence from real VM configurations is a really good point. I was
thinking we _could_ make all architectures start at 4GiB and make
32GiB the new static address available for tests to solve the
architecture divergence and complexity problems. But that would mean
all KVM selftests don't use GPAs 0-4GiB, and that seems like a
terrible idea now that you point it out.

Thanks for the thoughtful feedback. I'll give your suggestions a try in v2.

>
> One could argue that dynamically placing test specific would be more elegant,
> but I'd prefer to avoid that because I can't think of any value it would add
> from a test coverage perspective, and static addresses are much easier when it
> comes to debug.
>
> Having tests use e.g. 2GiB-3GiB or 1GiB-3GiB, would kinda sorta work, but that
> 2GiB limit isn't a trivial, e.g. max_guest_memory_test creates TiBs of memslots.
>
> IMO, memstress is the odd one out, it should be the one that needs to do special
> things.

Fair point.

>
> > >   3. Formalize memslots 0..2 (CODE, DATA, and PT) as being owned by the library,
> > >      with memslots 3..MAX available for test usage.
> > >
> > >   4. Modify tests that assume memslots 1..MAX are available, i.e. force them to
> > >      start at MEM_REGION_TEST_DATA.
> >
> > I think MEM_REGION_TEST_DATA is just where the framework will satisfy
> > test-initiated dynamic memory allocations. That's different from which
> > slots are free for the test to use.
> >
> > But assuming I understand your intention, I agree in spirit... Tests
> > should be allowed to use slots TEST_SLOT..MAX and physical addresses
> > TEST_GPA..MAX. The framework should provide both TEST_SLOT and
> > TEST_GPA (names pending), and existing tests should use those instead
> > of random hard-coded values.
> >
> > >
> > >   5. Use separate memslots for CODE, DATA, and PT by default.  This will allow
> > >      for more precise sizing of the CODE and DATA slots.
> >
> > What do you mean by "[separate memslots] will allow for more precise sizing"?
>
> I suspect there is a _lot_ of slop in the arbitrary 512 pages that are tacked on
> by vm_nr_pages_required().  Those 2MiBs probably don't matter, I just don't like
> completely magical numbers.

That makes sense, we can probably tighten up those heuristics and
maybe even get rid of the magic numbers. But I wasn't following what
_separate memslots_ has to do with it?