On 21.12.21 20:07, Jason Gunthorpe wrote: > On Tue, Dec 21, 2021 at 06:40:30PM +0100, David Hildenbrand wrote: > >> 2) is certainly the cherry on top. But it just means that R/O pins don't >> have to be the weird kid. And yes, achieving 2) would require >> FAULT_FLAG_EXCLUSIVE / FAULT_FLAG_UNSHARED, but it would really 99% do >> what existing COW logic does, just bypass the "map writable" and >> "trigger write fault" semantics. > > I still don't agree with this - when you come to patches can you have > this work at the end and under a good cover letter? Maybe it will make > more sense then. Yes. But really, I think it's the logical consequence of what Linus said [1]: "And then all GUP-fast would need to do is to refuse to look up a page that isn't exclusive to that VM. We already have the situation that GUP-fast can fail for non-writable pages etc, so it's just another test." We must not FOLL_PIN a page that is not exclusive (not only on gup-fast, but really, on any gup). If we special case R/O FOLL_PIN, we cannot enable the sanity check on unpin as suggested by Linus [2]: "If we only set the exclusive VM bit on pages that get mapped into user space, and we guarantee that GUP only looks up such pages, then we can also add a debug test to the "unpin" case that the bit is still set." There are really only two feasible options I see when we want to take a R/O FOLL_PIN on a !PageAnonExclusive() anon page (1) Fail the pinning completely. This implies that we'll have to fail O_DIRECT once converted to FOLL_PIN. (2) Request to mark the page PageAnonExclusive() via a FAULT_FLAG_UNSHARE and let it succeed. Anything else would require additional accounting that we already discussed in the past is hard -- for example, to differentiate R/O from R/W pins requiring two pin counters. The only impact would be that FOLL_PIN after fork() has to go via a FAULT_FLAG_UNSHARE once, to turn the page PageAnonExclusive. IMHO this is the right thing to do for FOLL_LONGTERM. For !FOLL_LONGTERM it would be nice to optimize this, to *not* do that, but again ... this would require even more counters I think, for example, to differentiate between "R/W short/long-term or R/O long-term pin" and "R/O short-term pin". So unless we discover a way to do additional accounting for ordinary 4k pages, I think we really can only do (1) or (2) to make sure we never ever pin a !PageAnonExclusive() page. [1] https://lkml.kernel.org/r/CAHk-=wgQq3H6wfkW7+MmduVgBOqHeiXQN97yCMd+m1mM-1xCLQ@xxxxxxxxxxxxxx [2] https://lkml.kernel.org/r/CAHk-=wiyxQ==vnHFHW99S_OPwA=u1Qrfg2OGr_6zPcBAuhQY2w@xxxxxxxxxxxxxx -- Thanks, David / dhildenb