Hello Mike, Regarding your (old) patch: > On May 23, 2018, at 12:42 AM, Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> wrote: > > If a process monitored with userfaultfd changes it's memory mappings or > forks() at the same time as uffd monitor fills the process memory with > UFFDIO_COPY, the actual creation of page table entries and copying of the > data in mcopy_atomic may happen either before of after the memory mapping > modifications and there is no way for the uffd monitor to maintain > consistent view of the process memory layout. > > For instance, let's consider fork() running in parallel with > userfaultfd_copy(): > > process | uffd monitor > ---------------------------------+------------------------------ > fork() | userfaultfd_copy() > ... | ... > dup_mmap() | down_read(mmap_sem) > down_write(mmap_sem) | /* create PTEs, copy data */ > dup_uffd() | up_read(mmap_sem) > copy_page_range() | > up_write(mmap_sem) | > dup_uffd_complete() | > /* notify monitor */ | > > If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will be > present by the time copy_page_range() is called and they will appear in the > child's memory mappings. However, if the fork() is the first to take the > mmap_sem, the new pages won't be mapped in the child's address space. > > Since userfaultfd monitor has no way to determine what was the order, let's > disallow userfaultfd_copy in parallel with the non-cooperative events. In > such case we return -EAGAIN and the uffd monitor can understand that > userfaultfd_copy() clashed with a non-cooperative event and take an > appropriate action. I am struggling to understand this patch and would appreciate your assistance. Specifically, I have two questions: 1. How can memory corruption occur? If the page is already mapped and the handler “mistakenly" calls userfaultfd_copy(), wouldn't mcopy_atomic_pte() return -EEXIST once it sees the PTE already exists? In such case, I would presume that the handler should be able to recover gracefully by waking the faulting thread. 2. How is memory ordering supposed to work here? IIUC, mmap_changing is not protected by any lock and there are no memory barriers that are associated with the assignment. Indeed, the code calls WRITE_ONCE()/READ_ONCE(), but AFAIK this does not guarantee ordering with non-volatile reads/writes. Thanks, Nadav