On 17.12.21 22:36, Linus Torvalds wrote: > On Fri, Dec 17, 2021 at 12:55 PM David Hildenbrand <david@xxxxxxxxxx> wrote: >> >> If we have a shared anonymous page we cannot have GUP references, not >> even R/O ones. Because GUP would have unshared and copied the page, >> resulting in a R/O mapped anonymous page. > > Doing a GUP on an actual shared page is wrong to begin with. > > You even know that, you try to use "page_mapcount() > 1" to disallow it. GUP is incomaptible with shared anonymous pages, therefore it has to trigger unsharing, correct. > > My point is that it's wrong regardless, and that "mapcount" is > dubious, and that COW cannot - and must not - use mapcount, and that I > think your shared case should strive to avoid it for the exact same > reason. For now I have not heard a compelling argument why the mapcount is dubious, I repeat: * mapcount can only increase due to fork() * mapcount can decrease due to unmap / zap We can protect from the transtition == 1 -> >1 using the mmap_lock. For COW the mapcount is the only thing that matters *if we take GUP* out of the equation. And that's exactly what we OTOH, take a look which issues resulted from the page_count changes. That's what I call dubious, sorry to say. > > So, what I think should happen is: > > (a) GUP makes sure that it only ever looks up pages that can be > shared with this VM. This may in involve breaking COW early with any > past fork(). Is that unsharing as we propose it? > > (b) it marks such pages so that any future work will not cause them > to COW either Right, exactly. GUP before fork does not result in a page getting shared again. > > Note that (a) is not necessarily "always COW and have to allocate and > copy new page". In particular, if the page is already writable, you > know you already have exclusive access to it and don't need to COW. > > And if it isn't writable, then the other common case is "the cow has > only one user, and it's us" - that's the "refcount == 1" case. > > And (b) is what we do with that page_maybe_dma_pinned() logic for > fork(), but also for things like swap cache creation (eg see commit > feb889fb40fa: "mm: don't put pinned pages into the swap cache"). I fully agree with b). GUP before fork is a totally different set of problems than GUP after fork. > > Note that this code all already exists, and already works - even > without getting the (very expensive) mmap_sem. So it works with > fast-GUP and it can race with concurrent forking by another thread, > which is why we also have that seqcount thing. I know, I studied it intensively :) > > As far as I can tell, your "mapcount" logic fundamentally requires > mmap_sem for the fork() race avoidance, for example. Yes. Or any other more lightweight synchronization in the future. For now this is just perfect. > > So this is why I don't like the mapcount games - I think they are very > fragile, and not at all as logical as the two simple rules a/b above. I don't really see anything fragile, really. I'm happy to learn as always. > > I believe you can make mapcount games _work_ - we used to have > something like that. It was incredibly fragile, and it had its own set > of bugs, but with enough care it's doable. We made it work, and it was comparatively simple. -- Thanks, David / dhildenb