Re: Re: folio_mmapped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 19, 2024 at 02:31:19PM +0000, Will Deacon wrote:
> On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote:
> > On 19.03.24 01:10, Sean Christopherson wrote:
> > > On Mon, Mar 18, 2024, Vishal Annapurve wrote:
> > > > On Mon, Mar 18, 2024 at 3:02 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
> > > > > Second, we should find better ways to let an IOMMU map these pages,
> > > > > *not* using GUP. There were already discussions on providing a similar
> > > > > fd+offset-style interface instead. GUP really sounds like the wrong
> > > > > approach here. Maybe we should look into passing not only guest_memfd,
> > > > > but also "ordinary" memfds.
> > > 
> > > +1.  I am not completely opposed to letting SNP and TDX effectively convert
> > > pages between private and shared, but I also completely agree that letting
> > > anything gup() guest_memfd memory is likely to end in tears.
> > 
> > Yes. Avoid it right from the start, if possible.
> > 
> > People wanted guest_memfd to *not* have to mmap guest memory ("even for
> > ordinary VMs"). Now people are saying we have to be able to mmap it in order
> > to GUP it. It's getting tiring, really.
> 
> From the pKVM side, we're working on guest_memfd primarily to avoid
> diverging from what other CoCo solutions end up using, but if it gets
> de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we do
> today with anonymous memory, then it's a really hard sell to switch over
> from what we have in production. We're also hoping that, over time,
> guest_memfd will become more closely integrated with the mm subsystem to
> enable things like hypervisor-assisted page migration, which we would
> love to have.
> 
> Today, we use the existing KVM interfaces (i.e. based on anonymous
> memory) and it mostly works with the one significant exception that
> accessing private memory via a GUP pin will crash the host kernel. If
> all guest_memfd() can offer to solve that problem is preventing GUP
> altogether, then I'd sooner just add that same restriction to what we
> currently have instead of overhauling the user ABI in favour of
> something which offers us very little in return.

How would we add the restriction to anonymous memory?

Thinking aloud -- do you mean like some sort of "exclusive GUP" flag
where mm can ensure that the exclusive GUP pin is the only pin? If the
refcount for the page is >1, then the exclusive GUP fails. Any future
GUP pin attempts would fail if the refcount has the EXCLUSIVE_BIAS.

> On the mmap() side of things for guest_memfd, a simpler option for us
> than what has currently been proposed might be to enforce that the VMM
> has unmapped all private pages on vCPU run, failing the ioctl if that's
> not the case. It needs a little more tracking in guest_memfd but I think
> GUP will then fall out in the wash because only shared pages will be
> mapped by userspace and so GUP will fail by construction for private
> pages.

We can prevent GUP after the pages are marked private, but the pages
could be marked private after the pages were already GUP'd. I don't have
a good way to detect this, so converting a page to private is difficult.

> We're happy to pursue alternative approaches using anonymous memory if
> you'd prefer to keep guest_memfd limited in functionality (e.g.
> preventing GUP of private pages by extending mapping_flags as per [1]),
> but we're equally willing to contribute to guest_memfd if extensions are
> welcome.
> 
> What do you prefer?
> 

I like this as a stepping stone. For the Android use cases, we don't
need to be able to convert a private page to shared and then also be
able to GUP it. If you want to GUP a page, use anonymous memory and that
memory will always be shared. If you don't care about GUP'ing (e.g. it's
going to be guest-private or you otherwise know you won't be GUP'ing),
you can use guest_memfd.

I don't think this design prevents us from adding "sometimes you can
GUP" to guest_memfd in the future. I don't think it creates extra
changes for KVM since anonymous memory is already supported; although
I'd have to add the support for Gunyah.

> [1] https://lore.kernel.org/r/4b0fd46a-cc4f-4cb7-9f6f-ce19a2d3064e@xxxxxxxxxx

Thanks,
Elliot





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux