Re: folio_mmapped

David Hildenbrand <david@xxxxxxxxxx> · Fri, 22 Mar 2024 18:16:14 +0100

On 19.03.24 16:04, Sean Christopherson wrote:
On Tue, Mar 19, 2024, David Hildenbrand wrote:
On 19.03.24 01:10, Sean Christopherson wrote:
Performance is a secondary concern.  If this were _just_ about guest performance,
I would unequivocally side with David: the guest gets to keep the pieces if it
fragments a 1GiB page.

The main problem we're trying to solve is that we want to provision a host such
that the host can serve 1GiB pages for non-CoCo VMs, and can also simultaneously
run CoCo VMs, with 100% fungibility.  I.e. a host could run 100% non-CoCo VMs,
100% CoCo VMs, or more likely, some sliding mix of the two.  Ideally, CoCo VMs
would also get the benefits of 1GiB mappings, that's not the driving motiviation
for this discussion.

Supporting 1 GiB mappings there sounds like unnecessary complexity and
opening a big can of worms, especially if "it's not the driving motivation".

If I understand you correctly, the scenario is

(1) We have free 1 GiB hugetlb pages lying around
(2) We want to start a CoCo VM
(3) We don't care about 1 GiB mappings for that CoCo VM,

We care about 1GiB mappings for CoCo VMs.  My comment about performance being a
secondary concern was specifically saying that it's the guest's responsilibity
to play nice with huge mappings if the guest cares about its performance.  For
guests that are well behaved, we most definitely want to provide a configuration
that performs as close to non-CoCo VMs as we can reasonably make it.

How does the guest know the granularity? I suspect it's just implicit 
knowledge that "PUD granularity might be nice".

And we can do that today, but it requires some amount of host memory to NOT be
in the HugeTLB pool, and instead be kept in reserved so that it can be used for
shared memory for CoCo VMs.  That approach has many downsides, as the extra memory
overhead affects CoCo VM shapes, our ability to use a common pool for non-CoCo
and CoCo VMs, and so on and so forth.

Right. But avoiding memory waste as soon as hugetlb is involved (and we 
have two separate memfds for private/shared memory) is not feasible.

     but hguetlb pages is all we have.
(4) We want to be able to use the 1 GiB hugetlb page in the future.

...

The other big advantage that we should lean into is that we can make assumptions
about guest_memfd usage that would never fly for a general purpose backing stores,
e.g. creating a dedicated memory pool for guest_memfd is acceptable, if not
desirable, for (almost?) all of the CoCo use cases.

I don't have any concrete ideas at this time, but my gut feeling is that this
won't be _that_ crazy hard to solve if commit hard to guest_memfd _not_ being
general purposes, and if we we account for conversion scenarios when designing
hugepage support for guest_memfd.

I'm hoping guest_memfd won't end up being the wild west of hacky MM ideas ;)

Quite the opposite, I'm saying we should be very deliberate in how we add hugepage
support and others features to guest_memfd, so that guest_memfd doesn't become a
hacky mess.

Good.

And I'm saying say we should stand firm in what guest_memfd _won't_ support, e.g.
swap/reclaim and probably page migration should get a hard "no".

I thought people wanted to support at least page migration in the 
future? (for example, see the reply from Will)

In other words, ditch the complexity for features that are well served by existing
general purpose solutions, so that guest_memfd can take on a bit of complexity to
serve use cases that are unique to KVM guests, without becoming an unmaintainble
mess due to cross-products.

And I believed that was true until people started wanting to mmap() this 
thing and brought GUP into the picture ... and then talk about HGM and 
all that. *shivers*

--
Cheers,

David / dhildenb