Re: Re: Re: folio_mmapped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 22, 2024 at 04:36:55PM +0000, Will Deacon wrote:
> Hi Elliot,
> 
> On Tue, Mar 19, 2024 at 04:54:10PM -0700, Elliot Berman wrote:
> > On Tue, Mar 19, 2024 at 02:31:19PM +0000, Will Deacon wrote:
> > > On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote:
> > > > On 19.03.24 01:10, Sean Christopherson wrote:
> > > > > +1.  I am not completely opposed to letting SNP and TDX effectively convert
> > > > > pages between private and shared, but I also completely agree that letting
> > > > > anything gup() guest_memfd memory is likely to end in tears.
> > > > 
> > > > Yes. Avoid it right from the start, if possible.
> > > > 
> > > > People wanted guest_memfd to *not* have to mmap guest memory ("even for
> > > > ordinary VMs"). Now people are saying we have to be able to mmap it in order
> > > > to GUP it. It's getting tiring, really.
> > > 
> > > From the pKVM side, we're working on guest_memfd primarily to avoid
> > > diverging from what other CoCo solutions end up using, but if it gets
> > > de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we do
> > > today with anonymous memory, then it's a really hard sell to switch over
> > > from what we have in production. We're also hoping that, over time,
> > > guest_memfd will become more closely integrated with the mm subsystem to
> > > enable things like hypervisor-assisted page migration, which we would
> > > love to have.
> > > 
> > > Today, we use the existing KVM interfaces (i.e. based on anonymous
> > > memory) and it mostly works with the one significant exception that
> > > accessing private memory via a GUP pin will crash the host kernel. If
> > > all guest_memfd() can offer to solve that problem is preventing GUP
> > > altogether, then I'd sooner just add that same restriction to what we
> > > currently have instead of overhauling the user ABI in favour of
> > > something which offers us very little in return.
> > 
> > How would we add the restriction to anonymous memory?
> > 
> > Thinking aloud -- do you mean like some sort of "exclusive GUP" flag
> > where mm can ensure that the exclusive GUP pin is the only pin? If the
> > refcount for the page is >1, then the exclusive GUP fails. Any future
> > GUP pin attempts would fail if the refcount has the EXCLUSIVE_BIAS.
> 
> Yes, I think we'd want something like that, but I don't think using a
> bias on its own is a good idea as false positives due to a large number
> of page references will then actually lead to problems (i.e. rejecting
> GUP spuriously), no? I suppose if you only considered the new bias in
> conjunction with the AS_NOGUP flag you proposed then it might be ok
> (i.e. when you see the bias, you then go check the address space to
> confirm). What do you think?
> 

I think the AS_NOGUP would prevent GUPing the first place. If we set the
EXCLUSIVE_BIAS value to something like INT_MAX, do we need to be worried
about there being INT_MAX-1 valid GUPs and wanting to add another?  From
the GUPer's perspective, I don't think it would be much different from
overflowing the refcount.

> > > On the mmap() side of things for guest_memfd, a simpler option for us
> > > than what has currently been proposed might be to enforce that the VMM
> > > has unmapped all private pages on vCPU run, failing the ioctl if that's
> > > not the case. It needs a little more tracking in guest_memfd but I think
> > > GUP will then fall out in the wash because only shared pages will be
> > > mapped by userspace and so GUP will fail by construction for private
> > > pages.
> > 
> > We can prevent GUP after the pages are marked private, but the pages
> > could be marked private after the pages were already GUP'd. I don't have
> > a good way to detect this, so converting a page to private is difficult.
> 
> For anonymous memory, marking the page as private is going to involve an
> exclusive GUP so that the page can safely be donated to the guest. In
> that case, any existing GUP pin should cause that to fail gracefully.
> What is the situation you are concerned about here?
> 

I wasn't thinking about exclusive GUP here. The exclusive GUP should be
able to get the guarantees we need.

I was thinking about making sure we gracefully handle a race to provide
the same page. The kernel should detect the difference between "we're
already providing the page" and "somebody has an unexpected pin". We can
easily read the refcount if we couldn't take the exclusive pin to know.

Thanks,
Elliot

> > > We're happy to pursue alternative approaches using anonymous memory if
> > > you'd prefer to keep guest_memfd limited in functionality (e.g.
> > > preventing GUP of private pages by extending mapping_flags as per [1]),
> > > but we're equally willing to contribute to guest_memfd if extensions are
> > > welcome.
> > > 
> > > What do you prefer?
> > > 
> > 
> > I like this as a stepping stone. For the Android use cases, we don't
> > need to be able to convert a private page to shared and then also be
> > able to GUP it.
> 
> I wouldn't want to rule that out, though. The VMM should be able to use
> shared pages just like it can with normal anonymous pages.
> 
> > I don't think this design prevents us from adding "sometimes you can
> > GUP" to guest_memfd in the future.
> 
> Technically, I think we can add all the stuff we need to guest_memfd,
> but there's a desire to keep that as simple as possible for now, which
> is why I'm keen to explore alternatives to unblock the pKVM upstreaming.
> 
> Will
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux