On 31.01.22 11:42, Mike Rapoport wrote: > Hi Nadav, > > On Sat, Jan 29, 2022 at 10:23:55PM -0800, Nadav Amit wrote: >> Using userfautlfd and looking at the kernel code, I encountered a usability >> issue that complicates userspace UFFD-monitor implementation. I obviosuly >> might be wrong, so I would appreciate a (polite?) feedback. I do have a >> userspace workaround, but I thought it is worthy to share and to hear your >> opinion, as well as feedback from other UFFD users. >> >> The issue I encountered regards the ordering of UFFD events tbat might not >> reflect the actual order in which events took place. >> >> In more detail, UFFD events (e.g., unmap, fork) are not ordered against >> themselves [*]. The mm-lock is dropped before notifying the userspace >> UFFD-monitor, and therefore there is no guarantee as to whether the order of >> the events actually reflects the order in which the events took place. >> This can prevent a UFFD-monitor from using the events to track which >> ranges are mapped. Specifically, UFFD_EVENT_FORK message and a >> UFFD_EVENT_UNMAP message (which reflects unmap in the parent process) can >> be reordered, if the events are triggered by two different threads. In >> this case the UFFD-monitor cannot figure from the events whether the >> child process has the unmapped memory range still mapped (because fork >> happened first) or not. > > Yeah, it seems that something like this is possible: > > > fork() munmap() > mmap_write_unlock(); > mmap_write_lock_killable(); > do_things(); > mmap_{read,write}_unlock(); > userfaultfd_unmap_complete(); > dup_userfaultfd_complete(); > I was thinking about other possible races, e.g., MADV_DONTNEED/MADV_FREE racing with UFFD_EVENT_PAGEFAULT -- where we only hold the mmap_lock in read mode. But not sure if they apply. The fork() vs. munmap() is somewhat "obviously problematic" :) -- Thanks, David / dhildenb