On 17.12.21 22:15, Nadav Amit wrote: > > >> On Dec 17, 2021, at 12:47 PM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: >> >> On Fri, Dec 17, 2021 at 12:36:43PM -0800, Linus Torvalds wrote: >> >>>> 5. Take a R/O pin (RDMA, VFIO, ...) >>>> -> refcount > 1 >>>> >>>> 6. memset(mem, 0xff, pagesize); >>>> -> Write fault -> COW >>> >>> I do not believe this is actually a bug. >>> >>> You asked for a R/O pin, and you got one. >>> >>> Then somebody else modified that page, and you got exactly what you >>> asked for - a COW event. The original R/O pin has the original page >>> that it asked for, and can read it just fine. >> >> To remind all, the GUP users, like RDMA, VFIO use >> FOLL_FORCE|FOLL_WRITE to get a 'r/o pin' specifically because of the >> COW breaking the coherence. In these case 'r/o pin' does not mean >> "snapshot the data", but its only a promise not to write to the pages >> and still desires coherence with the memory map. >> >> Eg in RDMA we know of apps asking for a R/O pin of something in .bss >> then filling that something with data finally doing the actual >> DMA. Breaking COW after pin breaks those apps. >> >> The above #5 can occur for O_DIRECT read and in that case the >> 'snapshot the data' is perfectly fine as racing the COW with the >> O_DIRECT read just resolves the race toward the read() direction. >> >> IIRC there is some other scenario that motivated this patch? > > I think that there is an assumption that once a page is COW-broken, > it would never have another write-fault that might lead to COW > breaking later. > > AFAIK at least after userfaultfd-WP followed by > userfaultfd-write-unprotect a page might be write-protected and > go through do_wp_page() a second time to be COW-broken again. In > such case, I think the FOLL_FORCE|FOLL_WRITE would not help. > > I suspect (not sure) that this might even happen with mprotect() > since I do not see all code-paths preserving whether the page > was writable. > uffd-wp and mprotect() are broken as well, yes. But the easiest example is just swap + read fault. Section 2 and 3 in [1], along with reproducers. Note that I didn't mention uffd-wp and mprotect(), because these require "manual intervention". With swap, it's not your application doing something "special". [1] https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@xxxxxxxxxx -- Thanks, David / dhildenb