Re: userfaultfd: usability issue due to lack of UFFD events ordering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Jan 31, 2022, at 2:42 AM, Mike Rapoport <rppt@xxxxxxxxxx> wrote:
> 
> Hi Nadav,
> 
> On Sat, Jan 29, 2022 at 10:23:55PM -0800, Nadav Amit wrote:
>> Using userfautlfd and looking at the kernel code, I encountered a usability
>> issue that complicates userspace UFFD-monitor implementation. I obviosuly
>> might be wrong, so I would appreciate a (polite?) feedback. I do have a
>> userspace workaround, but I thought it is worthy to share and to hear your
>> opinion, as well as feedback from other UFFD users.
>> 
>> The issue I encountered regards the ordering of UFFD events tbat might not
>> reflect the actual order in which events took place.
>> 
>> In more detail, UFFD events (e.g., unmap, fork) are not ordered against
>> themselves [*]. The mm-lock is dropped before notifying the userspace
>> UFFD-monitor, and therefore there is no guarantee as to whether the order of
>> the events actually reflects the order in which the events took place.
>> This can prevent a UFFD-monitor from using the events to track which
>> ranges are mapped. Specifically, UFFD_EVENT_FORK message and a
>> UFFD_EVENT_UNMAP message (which reflects unmap in the parent process) can
>> be reordered, if the events are triggered by two different threads. In
>> this case the UFFD-monitor cannot figure from the events whether the
>> child process has the unmapped memory range still mapped (because fork
>> happened first) or not.
> 
> Yeah, it seems that something like this is possible:
> 
> 
> fork()					munmap()
> 	mmap_write_unlock();
> 						mmap_write_lock_killable();
> 						do_things();
> 						mmap_{read,write}_unlock();
> 						userfaultfd_unmap_complete();
> 	dup_userfaultfd_complete();
> 
> A solution could be to split uffd_*_complete() to two parts: one that
> queues up the event message and the second one that waits for it to be read
> by the monitor. The first part then can run befor mm-lock is released.
> 
> If you can think of something nicer, it'll be really great!

Thanks for the quick response. Your solution is possible, but then the
order between events and page-faults is certainly not kept - as David
mentioned: regardless of mm-lock that is not always taken for write,
events and page-faults are on two separate lists, and queued page-faults
are reported before events.

I am also not sure how simple/performant it is, since it would require
an additional refcount for userfaultfd_wait_queue to prevent it from
disappearing between the time it is enqueued to the time it blocks.

Another option is to associate some “generation” or “sequence number”
with every event and change the PAI to include it. It still leaves the
problem of ordering MADV_DONTNEED and page-faults though.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux