Re: Re: folio_mmapped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Hi Elliot,

On Tue, Mar 19, 2024 at 04:54:10PM -0700, Elliot Berman wrote:
> On Tue, Mar 19, 2024 at 02:31:19PM +0000, Will Deacon wrote:
> > On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote:
> > > On 19.03.24 01:10, Sean Christopherson wrote:
> > > > +1.  I am not completely opposed to letting SNP and TDX effectively convert
> > > > pages between private and shared, but I also completely agree that letting
> > > > anything gup() guest_memfd memory is likely to end in tears.
> > > 
> > > Yes. Avoid it right from the start, if possible.
> > > 
> > > People wanted guest_memfd to *not* have to mmap guest memory ("even for
> > > ordinary VMs"). Now people are saying we have to be able to mmap it in order
> > > to GUP it. It's getting tiring, really.
> > 
> > From the pKVM side, we're working on guest_memfd primarily to avoid
> > diverging from what other CoCo solutions end up using, but if it gets
> > de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we do
> > today with anonymous memory, then it's a really hard sell to switch over
> > from what we have in production. We're also hoping that, over time,
> > guest_memfd will become more closely integrated with the mm subsystem to
> > enable things like hypervisor-assisted page migration, which we would
> > love to have.
> > 
> > Today, we use the existing KVM interfaces (i.e. based on anonymous
> > memory) and it mostly works with the one significant exception that
> > accessing private memory via a GUP pin will crash the host kernel. If
> > all guest_memfd() can offer to solve that problem is preventing GUP
> > altogether, then I'd sooner just add that same restriction to what we
> > currently have instead of overhauling the user ABI in favour of
> > something which offers us very little in return.
> How would we add the restriction to anonymous memory?
> Thinking aloud -- do you mean like some sort of "exclusive GUP" flag
> where mm can ensure that the exclusive GUP pin is the only pin? If the
> refcount for the page is >1, then the exclusive GUP fails. Any future
> GUP pin attempts would fail if the refcount has the EXCLUSIVE_BIAS.

Yes, I think we'd want something like that, but I don't think using a
bias on its own is a good idea as false positives due to a large number
of page references will then actually lead to problems (i.e. rejecting
GUP spuriously), no? I suppose if you only considered the new bias in
conjunction with the AS_NOGUP flag you proposed then it might be ok
(i.e. when you see the bias, you then go check the address space to
confirm). What do you think?

> > On the mmap() side of things for guest_memfd, a simpler option for us
> > than what has currently been proposed might be to enforce that the VMM
> > has unmapped all private pages on vCPU run, failing the ioctl if that's
> > not the case. It needs a little more tracking in guest_memfd but I think
> > GUP will then fall out in the wash because only shared pages will be
> > mapped by userspace and so GUP will fail by construction for private
> > pages.
> We can prevent GUP after the pages are marked private, but the pages
> could be marked private after the pages were already GUP'd. I don't have
> a good way to detect this, so converting a page to private is difficult.

For anonymous memory, marking the page as private is going to involve an
exclusive GUP so that the page can safely be donated to the guest. In
that case, any existing GUP pin should cause that to fail gracefully.
What is the situation you are concerned about here?

> > We're happy to pursue alternative approaches using anonymous memory if
> > you'd prefer to keep guest_memfd limited in functionality (e.g.
> > preventing GUP of private pages by extending mapping_flags as per [1]),
> > but we're equally willing to contribute to guest_memfd if extensions are
> > welcome.
> > 
> > What do you prefer?
> > 
> I like this as a stepping stone. For the Android use cases, we don't
> need to be able to convert a private page to shared and then also be
> able to GUP it.

I wouldn't want to rule that out, though. The VMM should be able to use
shared pages just like it can with normal anonymous pages.

> I don't think this design prevents us from adding "sometimes you can
> GUP" to guest_memfd in the future.

Technically, I think we can add all the stuff we need to guest_memfd,
but there's a desire to keep that as simple as possible for now, which
is why I'm keen to explore alternatives to unblock the pKVM upstreaming.


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux