On 09.08.22 20:48, Jason Gunthorpe wrote: > On Tue, Aug 09, 2022 at 11:40:50AM -0700, Linus Torvalds wrote: >> On Mon, Aug 8, 2022 at 12:32 AM David Hildenbrand <david@xxxxxxxxxx> wrote: >>> >>> For example, a write() via /proc/self/mem to a uffd-wp-protected range has >>> to fail instead of silently granting write access and bypassing the >>> userspace fault handler. Note that FOLL_FORCE is not only used for debug >>> access, but also triggered by applications without debug intentions, for >>> example, when pinning pages via RDMA. >> >> So this made me go "Whaa?" >> >> I didn't even realize that the media drivers and rdma used FOLL_FORCE. >> >> That's just completely bogus. >> >> Why do they do that? >> >> It seems to be completely bogus, and seems to have no actual valid >> reason for it. Looking through the history, it goes back to the >> original code submission in 2006, and doesn't have a mention of why. > > It is because of all this madness with COW. > > Lets say an app does: > > buf = mmap(MAP_PRIVATE) > rdma_pin_for_read(buf) > buf[0] = 1 > > Then the store to buf[0] will COW the page and the pin will become > decoherent. > > The purpose of the FORCE is to force COW to happen early so this is > avoided. > > Sadly there are real apps that end up working this way, eg because > they are using buffer in .data or something. > > I've hoped David's new work on this provided some kind of path to a > proper solution, as the need is very similar to all the other places > where we need to ensure there is no possiblity of future COW. > > So, these usage can be interpreted as a FOLL flag we don't have - some > kind of (FOLL_EXCLUSIVE | FOLL_READ) to match the PG_anon_exclusive > sort of idea. Thanks Jason for the explanation. I have patches in the works to no longer use FOLL_FORCE|FOLL_WRITE for taking a reliable longerterm R/O pin in a MAP_PRIVATE mapping. The patches are mostly done (and comparably simple), I merely deferred sending them out because I stumbled over this issue first. Some ugly corner cases of MAP_SHARED remain, but for most prominent use cases, my upcoming patches should allow us to just have longterm R/O pins working as expected. -- Thanks, David / dhildenb