Hi Elliot, On Fri, Mar 22, 2024 at 11:46:10AM -0700, Elliot Berman wrote: > On Fri, Mar 22, 2024 at 04:36:55PM +0000, Will Deacon wrote: > > On Tue, Mar 19, 2024 at 04:54:10PM -0700, Elliot Berman wrote: > > > How would we add the restriction to anonymous memory? > > > > > > Thinking aloud -- do you mean like some sort of "exclusive GUP" flag > > > where mm can ensure that the exclusive GUP pin is the only pin? If the > > > refcount for the page is >1, then the exclusive GUP fails. Any future > > > GUP pin attempts would fail if the refcount has the EXCLUSIVE_BIAS. > > > > Yes, I think we'd want something like that, but I don't think using a > > bias on its own is a good idea as false positives due to a large number > > of page references will then actually lead to problems (i.e. rejecting > > GUP spuriously), no? I suppose if you only considered the new bias in > > conjunction with the AS_NOGUP flag you proposed then it might be ok > > (i.e. when you see the bias, you then go check the address space to > > confirm). What do you think? > > > > I think the AS_NOGUP would prevent GUPing the first place. If we set the > EXCLUSIVE_BIAS value to something like INT_MAX, do we need to be worried > about there being INT_MAX-1 valid GUPs and wanting to add another? From > the GUPer's perspective, I don't think it would be much different from > overflowing the refcount. I don't think we should end up in a position where a malicious user can take a tonne of references and cause functional problems. For example, look at page_maybe_dma_pinned(); the worst case is we end up treating heavily referenced pages as pinned and the soft-dirty logic leaves them perpetually dirtied. The side-effects of what we're talking about here seem to be much worse than that unless we can confirm that the page really is exclusive. > > > > On the mmap() side of things for guest_memfd, a simpler option for us > > > > than what has currently been proposed might be to enforce that the VMM > > > > has unmapped all private pages on vCPU run, failing the ioctl if that's > > > > not the case. It needs a little more tracking in guest_memfd but I think > > > > GUP will then fall out in the wash because only shared pages will be > > > > mapped by userspace and so GUP will fail by construction for private > > > > pages. > > > > > > We can prevent GUP after the pages are marked private, but the pages > > > could be marked private after the pages were already GUP'd. I don't have > > > a good way to detect this, so converting a page to private is difficult. > > > > For anonymous memory, marking the page as private is going to involve an > > exclusive GUP so that the page can safely be donated to the guest. In > > that case, any existing GUP pin should cause that to fail gracefully. > > What is the situation you are concerned about here? > > > > I wasn't thinking about exclusive GUP here. The exclusive GUP should be > able to get the guarantees we need. > > I was thinking about making sure we gracefully handle a race to provide > the same page. The kernel should detect the difference between "we're > already providing the page" and "somebody has an unexpected pin". We can > easily read the refcount if we couldn't take the exclusive pin to know. Thanks, that makes sense to me. For pKVM, the architecture code also tracks all the donated pages, so we should be able to provide additional metadata here if we shuffle things around a little. Will