On Mon, Sep 28, 2020 at 12:29:55PM -0700, Linus Torvalds wrote: > So a read pin action would basically never work for the fast-path for > a few cases, notably a shared read-only mapping - because we could > never mark it in the page tables as "fast pin accessible" Agree, I was assuming we'd loose more of the fast path to create this thing. It would only still be fast if the pages are already writable. I strongly suspect the case of DMA'ing actual read-only data is the minority here, the usual case is probably filling a writable buffer with something interesting and then triggering the DMA. The DMA just happens to be read from the driver view so the driver doesn't set FOLL_WRITE. Looking at the FOLL_LONGTERM users, which should be the banner usecase for this, there are very few that do a read pin and use fast. > And it would basically have no advantages over a writable FOLL_PIN. It > would break the association with any backing store for private pages, > because otherwise it can't follow future writes. Yes, I wasn't clear enough, I'm looking at this from a driver API perspective. We have this API pin_user_pages(FOLL_LONGTERM | FOLL_WRITE) Which now has no decoherence issues with the MM. If the driver naturally wants to do read-only access it might be tempted to do: pin_user_pages(FOLL_LONGTERM) Which is now NOT the same thing and brings all these really surprising mm coherence issues back. The driver author might discover this in testing, then be tempted to hardwire 'FOLL_LONGTERM | FOLL_WRITE'. Now their uAPI is broken for things that are actually read-only like .rodata. If they discover this then they add a FOLL_FORCE to the mix. When someone comes along to read this later it is a big leap to see pin_user_pages(FOLL_LONGTERM | FOLL_FORCE | FOLL_WRITE) and realize this is code for "read only mapping". At least it took me a while to decipher it the first time I saw it. I think this is really hard to use and ugly. My thinking has been to just stick: if (flags & FOLL_LONGTERM) flags |= FOLL_FORCE | FOLL_WRITE In pin_user_pages(). It would make the driver API cleaner. If we can do a bit better somehow by not COW'ing for certain VMA's as you explained then all the better, but not my primary goal.. Basically, I think if a driver is using FOLL_LONGTERM | FOLL_PIN we should guarentee that driver a consistent MM and take the gup_fast performance hit to do it. AFAICT the giant wack of other cases not using FOLL_LONGTERM really shouldn't care about read-decoherence. For those cases the user should really not be racing write's with data under read-only pin, and the new COW logic looks like it solves the other issues with this. I know Jann/John have been careful to not have special behaviors for the DMA case, but I think it makes sense here. It is actually different. Jason