On Tue, Mar 19, 2024 at 02:31:19PM +0000, Will Deacon wrote: > On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote: > > On 19.03.24 01:10, Sean Christopherson wrote: > > > On Mon, Mar 18, 2024, Vishal Annapurve wrote: > > > > On Mon, Mar 18, 2024 at 3:02 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > > > > Second, we should find better ways to let an IOMMU map these pages, > > > > > *not* using GUP. There were already discussions on providing a similar > > > > > fd+offset-style interface instead. GUP really sounds like the wrong > > > > > approach here. Maybe we should look into passing not only guest_memfd, > > > > > but also "ordinary" memfds. > > > > > > +1. I am not completely opposed to letting SNP and TDX effectively convert > > > pages between private and shared, but I also completely agree that letting > > > anything gup() guest_memfd memory is likely to end in tears. > > > > Yes. Avoid it right from the start, if possible. > > > > People wanted guest_memfd to *not* have to mmap guest memory ("even for > > ordinary VMs"). Now people are saying we have to be able to mmap it in order > > to GUP it. It's getting tiring, really. > > From the pKVM side, we're working on guest_memfd primarily to avoid > diverging from what other CoCo solutions end up using, but if it gets > de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we do > today with anonymous memory, then it's a really hard sell to switch over > from what we have in production. We're also hoping that, over time, > guest_memfd will become more closely integrated with the mm subsystem to > enable things like hypervisor-assisted page migration, which we would > love to have. > > Today, we use the existing KVM interfaces (i.e. based on anonymous > memory) and it mostly works with the one significant exception that > accessing private memory via a GUP pin will crash the host kernel. If > all guest_memfd() can offer to solve that problem is preventing GUP > altogether, then I'd sooner just add that same restriction to what we > currently have instead of overhauling the user ABI in favour of > something which offers us very little in return. How would we add the restriction to anonymous memory? Thinking aloud -- do you mean like some sort of "exclusive GUP" flag where mm can ensure that the exclusive GUP pin is the only pin? If the refcount for the page is >1, then the exclusive GUP fails. Any future GUP pin attempts would fail if the refcount has the EXCLUSIVE_BIAS. > On the mmap() side of things for guest_memfd, a simpler option for us > than what has currently been proposed might be to enforce that the VMM > has unmapped all private pages on vCPU run, failing the ioctl if that's > not the case. It needs a little more tracking in guest_memfd but I think > GUP will then fall out in the wash because only shared pages will be > mapped by userspace and so GUP will fail by construction for private > pages. We can prevent GUP after the pages are marked private, but the pages could be marked private after the pages were already GUP'd. I don't have a good way to detect this, so converting a page to private is difficult. > We're happy to pursue alternative approaches using anonymous memory if > you'd prefer to keep guest_memfd limited in functionality (e.g. > preventing GUP of private pages by extending mapping_flags as per [1]), > but we're equally willing to contribute to guest_memfd if extensions are > welcome. > > What do you prefer? > I like this as a stepping stone. For the Android use cases, we don't need to be able to convert a private page to shared and then also be able to GUP it. If you want to GUP a page, use anonymous memory and that memory will always be shared. If you don't care about GUP'ing (e.g. it's going to be guest-private or you otherwise know you won't be GUP'ing), you can use guest_memfd. I don't think this design prevents us from adding "sometimes you can GUP" to guest_memfd in the future. I don't think it creates extra changes for KVM since anonymous memory is already supported; although I'd have to add the support for Gunyah. > [1] https://lore.kernel.org/r/4b0fd46a-cc4f-4cb7-9f6f-ce19a2d3064e@xxxxxxxxxx Thanks, Elliot