On April 27, 2017 8:26:16 PM GMT+03:00, "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> wrote: >Hi Mike, > >I've applied this, but have some questions/points I think >further clarification. > >On 04/27/2017 04:14 PM, Mike Rapoport wrote: >> Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> >> --- >> man2/userfaultfd.2 | 135 >++++++++++++++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 128 insertions(+), 7 deletions(-) >> >> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2 >> index cfea5cb..44af3e4 100644 >> --- a/man2/userfaultfd.2 >> +++ b/man2/userfaultfd.2 >> @@ -75,7 +75,7 @@ flag in >> .PP >> When the last file descriptor referring to a userfaultfd object is >closed, >> all memory ranges that were registered with the object are >unregistered >> -and unread page-fault events are flushed. >> +and unread events are flushed. >> .\" >> .SS Usage >> The userfaultfd mechanism is designed to allow a thread in a >multithreaded >> @@ -99,6 +99,20 @@ In such non-cooperative mode, >> the process that monitors userfaultfd and handles page faults >> needs to be aware of the changes in the virtual memory layout >> of the faulting process to avoid memory corruption. >> + >> +Starting from Linux 4.11, >> +userfaultfd may notify the fault-handling threads about changes >> +in the virtual memory layout of the faulting process. >> +In addition, if the faulting process invokes >> +.BR fork (2) >> +system call, >> +the userfaultfd objects associated with the parent may be duplicated >> +into the child process and the userfaultfd monitor will be notified >> +about the file descriptor associated with the userfault objects > >What does "notified about the file descriptor" mean? Well, seems that I've made this one really awkward :) When the monitored process forks, all the userfault objects associated with it are duplicated into the child process. For each duplicated object, userfault generates event of type UFFD_EVENT_FORK and the uffdio_msg for this event contains the file descriptor that should be used to manipulate the duplicated userfault object. Hope this clarifies. >> +created for the child process, >> +which allows userfaultfd monitor to perform user-space paging >> +for the child process. >> + >> .\" FIXME elaborate about non-cooperating mode, describe its >limitations >> .\" for kernels before 4.11, features added in 4.11 >> .\" and limitations remaining in 4.11 >> @@ -144,6 +158,10 @@ Details of the various >> operations can be found in >> .BR ioctl_userfaultfd (2). >> >> +Since Linux 4.11, events other than page-fault may enabled during >> +.B UFFDIO_API >> +operation. >> + >> Up to Linux 4.11, >> userfaultfd can be used only with anonymous private memory mappings. >> >> @@ -156,7 +174,8 @@ Each >> .BR read (2) >> from the userfaultfd file descriptor returns one or more >> .I uffd_msg >> -structures, each of which describes a page-fault event: >> +structures, each of which describes a page-fault event >> +or an event required for the non-cooperative userfaultfd usage: >> >> .nf >> .in +4n >> @@ -168,6 +187,23 @@ struct uffd_msg { >> __u64 flags; /* Flags describing fault */ >> __u64 address; /* Faulting address */ >> } pagefault; >> + struct { >> + __u32 ufd; /* userfault file descriptor >> + of the child process */ >> + } fork; /* since Linux 4.11 */ >> + struct { >> + __u64 from; /* old address of the >> + remapped area */ >> + __u64 to; /* new address of the >> + remapped area */ >> + __u64 len; /* original mapping length */ >> + } remap; /* since Linux 4.11 */ >> + struct { >> + __u64 start; /* start address of the >> + removed area */ >> + __u64 end; /* end address of the >> + removed area */ >> + } remove; /* since Linux 4.11 */ >> ... >> } arg; >> >> @@ -194,14 +230,73 @@ structure are as follows: >> .TP >> .I event >> The type of event. >> -Currently, only one value can appear in this field: >> -.BR UFFD_EVENT_PAGEFAULT , >> -which indicates a page-fault event. >> +Depending of the event type, >> +different fields of the >> +.I arg >> +union represent details required for the event processing. >> +The non-page-fault events are generated only when appropriate >feature >> +is enabled during API handshake with >> +.B UFFDIO_API >> +.BR ioctl (2). >> + >> +The following values can appear in the >> +.I event >> +field: >> +.RS >> +.TP >> +.B UFFD_EVENT_PAGEFAULT >> +A page-fault event. >> +The page-fault details are available in the >> +.I pagefault >> +field. >> .TP >> -.I address >> +.B UFFD_EVENT_FORK >> +Generated when the faulting process invokes >> +.BR fork (2) >> +system call. >> +The event details are available in the >> +.I fork >> +field. >> +.\" FIXME descirbe duplication of userfault file descriptor during >fork >> +.TP >> +.B UFFD_EVENT_REMAP >> +Generated when the faulting process invokes >> +.BR mremap (2) >> +system call. >> +The event details are available in the >> +.I remap >> +field. >> +.TP >> +.B UFFD_EVENT_REMOVE >> +Generated when the faulting process invokes >> +.BR madvise (2) >> +system call with >> +.BR MADV_DONTNEED >> +or >> +.BR MADV_REMOVE >> +advice. >> +The event details are available in the >> +.I remove >> +field. >> +.TP >> +.B UFFD_EVENT_UNMAP >> +Generated when the faulting process unmaps a memory range, >> +either explicitly using >> +.BR munmap (2) >> +system call or implicitly during >> +.BR mmap (2) >> +or >> +.BR mremap (2) >> +system calls. >> +The event details are available in the >> +.I remove >> +field. >> +.RE >> +.TP >> +.I pagefault.address >> The address that triggered the page fault. >> .TP >> -.I flags >> +.I pagefault.flags >> A bit mask of flags that describe the event. >> For >> .BR UFFD_EVENT_PAGEFAULT , >> @@ -218,6 +313,32 @@ otherwise it is a read fault. >> .\" >> .\" UFFD_PAGEFAULT_FLAG_WP is not yet supported. >> .RE >> +.TP >> +.I fork.ufd >> +The file descriptor associated with the userfault object >> +created for the child process >> +.TP >> +.I remap.from >> +The original address of the memory range that was remapped using >> +.BR mremap (2). >> +.TP >> +.I remap.to >> +The new address of the memory range that was remapped using >> +.BR mremap (2). >> +.TP >> +.I remap.len >> +The original length of the the memory range that was remapped using >> +.BR mremap (2). >> +.TP >> +.I remove.start >> +The start address of the memory range that was freed using >> +.BR madvise (2) >> +or unmapped >> +.TP >> +.I remove.end >> +The end address of the memory range that was freed using >> +.BR madvise (2) >> +or unmapped >> .PP >> A >> .BR read (2) > >Cheers, > >Michael -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html