Hi Nadav, On Sat, Jan 29, 2022 at 10:23:55PM -0800, Nadav Amit wrote: > Using userfautlfd and looking at the kernel code, I encountered a usability > issue that complicates userspace UFFD-monitor implementation. I obviosuly > might be wrong, so I would appreciate a (polite?) feedback. I do have a > userspace workaround, but I thought it is worthy to share and to hear your > opinion, as well as feedback from other UFFD users. > > The issue I encountered regards the ordering of UFFD events tbat might not > reflect the actual order in which events took place. > > In more detail, UFFD events (e.g., unmap, fork) are not ordered against > themselves [*]. The mm-lock is dropped before notifying the userspace > UFFD-monitor, and therefore there is no guarantee as to whether the order of > the events actually reflects the order in which the events took place. > This can prevent a UFFD-monitor from using the events to track which > ranges are mapped. Specifically, UFFD_EVENT_FORK message and a > UFFD_EVENT_UNMAP message (which reflects unmap in the parent process) can > be reordered, if the events are triggered by two different threads. In > this case the UFFD-monitor cannot figure from the events whether the > child process has the unmapped memory range still mapped (because fork > happened first) or not. Yeah, it seems that something like this is possible: fork() munmap() mmap_write_unlock(); mmap_write_lock_killable(); do_things(); mmap_{read,write}_unlock(); userfaultfd_unmap_complete(); dup_userfaultfd_complete(); A solution could be to split uffd_*_complete() to two parts: one that queues up the event message and the second one that waits for it to be read by the monitor. The first part then can run befor mm-lock is released. If you can think of something nicer, it'll be really great! > Obviously, it does not make sense to keep holding mm-lock while notifying the > user, as it can even lead to deadlocks. Userspace UFFD-monitors can > workaround this issue by using seccomp+ptrace instead of UFFD-events to > obtain order of the events or examine /proc/[pid]/smaps. Yet, this introduces > overheads, is complicated, and I doubt anyone does so. I wonder if the API is > reasonable, or whether I am missing something. > > Thanks, > Nadav > > [*] Note that I do not discuss UFFD-monitor issued ioctl's, but the order > between UFFD-events. > -- Sincerely yours, Mike.