Hi Mike, Thanks for working on this page. Just for background (since it helps me fore review), how did you get the info that is documented in the page? Cheers, Michael On 21 December 2016 at 09:08, Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> wrote: > Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> > --- > man2/userfaultfd.2 | 314 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 314 insertions(+) > create mode 100644 man2/userfaultfd.2 > > diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2 > new file mode 100644 > index 0000000..d2196cd > --- /dev/null > +++ b/man2/userfaultfd.2 > @@ -0,0 +1,314 @@ > +.\" Copyright (c) 2016, IBM Corporation. > +.\" Written by Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> > +.\" > +.\" %%%LICENSE_START(VERBATIM) > +.\" Permission is granted to make and distribute verbatim copies of this > +.\" manual provided the copyright notice and this permission notice are > +.\" preserved on all copies. > +.\" > +.\" Permission is granted to copy and distribute modified versions of this > +.\" manual under the conditions for verbatim copying, provided that the > +.\" entire resulting derived work is distributed under the terms of a > +.\" permission notice identical to this one. > +.\" > +.\" Since the Linux kernel and libraries are constantly changing, this > +.\" manual page may be incorrect or out-of-date. The author(s) assume no > +.\" responsibility for errors or omissions, or for damages resulting from > +.\" the use of the information contained herein. The author(s) may not > +.\" have taken the same level of care in the production of this manual, > +.\" which is licensed free of charge, as they might when working > +.\" professionally. > +.\" > +.\" Formatted or processed versions of this manual, if unaccompanied by > +.\" the source, must acknowledge the copyright and authors of this work. > +.\" %%%LICENSE_END > +.\" > +.TH USERFAULTFD 2 1016-12-12 "Linux" "Linux Programmer's Manual" > +.SH NAME > +userfaultfd \- create a file descriptor for handling page faults in user > +space > +.SH SYNOPSIS > +.nf > +.B #include <sys/types.h> > +.sp > +.BI "int userfaultfd(int " flags ); > +.fi > +.PP > +.IR Note : > +There is no glibc wrapper for this system call; see NOTES. > +.SH DESCRIPTION > +.BR userfaultfd (2) > +creates a userfaultfd object that can be used for delegation of page fault > +handling to a user space application. > +The userfaultfd should be configured using > +.BR ioctl (2). > +Once the userfaultfd is configured, the application can use > +.BR read (2) > +to receive userfaultfd notifications. > +The reads from userfaultfd may be blocking or non-blocking, depending on > +the value of > +.I flags > +used for the creation of the userfaultfd or subsequent calls to > +.BR fcntl (2) . > + > +The following values may be bitwise ORed in > +.IR flags > +to change the behavior of > +.BR userfaultfd (): > +.TP > +.BR O_CLOEXEC > +Enable the close-on-exec flag for the new userfaultfd object. > +See the description of the > +.B O_CLOEXEC > +flag in > +.BR open (2) > +.TP > +.BR O_NONBLOCK > +Enables non-blocking operation for the userfaultfd > +.BR O_NONBLOCK > +See the description of the > +.BR O_NONBLOCK > +flag in > +.BR open (2). > +.\" > +.SS Userfaultfd operation > +After the userfaultfd object is created with > +.BR userfaultfd (2) > +system call, the application have to enable it using > +.I UFFDIO_API > +ioctl to perform API version and supported features handshake between the > +kernel and the user space. > +If the > +.I UFFDIO_API > +is successful, the application should register memory ranges using > +.I UFFDIO_REGISTER > +ioctl. After successful completion of > +.I UFFDIO_REGISTER > +ioctl, a page fault occurring in the requested memory range, and satisfying > +the mode defined at the register time, will be forwarded by the kernel to > +the user space application. > +The application then can use > +.I UFFDIO_COPY > +or > +.I UFFDIO_ZERO > +ioctls to resolve the page fault. > +.PP > +Currently, userfaultfd can only be used with anonymous private memory > +mappings. > +.\" > +.SS API Ioctls > +The API ioctls are used to configure userfaultfd behavior. > +They allow to choose what features will be enabled and what kinds of events > +will be delivered to the application. > +.TP > +.BR "UFFDIO_API struct uffdio_api *" api > +Enable userfaultfd and perform API handshake. > +The > +.I uffdio_api > +structure is defined as: > +.in +4n > +.nf > + > +struct uffdio_api { > + __u64 api; > + __u64 features; > + __u64 ioctls; > +}; > + > +.fi > +.in > +The > +.I api > +field denotes the API version requested by the application. > +The kernel verifies that it can support the required API, and sets the > +.I features > +and > +.I ioctls > +fields to bit masks representing all the available features and the generic > +ioctls available. > +.\" > +.TP > +.BI "UFFDIO_REGISTER struct uffdio_register *" arg > +Register a memory range with userfaultfd. > +The > +.I uffdio_register > +structure is defined as: > +.in +4n > +.nf > + > +struct uffdio_range { > + __u64 start; > + __u64 end; > +}; > + > +struct uffdio_register { > + struct uffdio_range range; > + __u64 mode; > + __u64 ioctls; > +}; > + > +.fi > +.in > + > +The > +.I range > +field defines a memory range starting at > +.I start > +and ending at > +.I end > +that should be handled by the userfaultfd. > +The > +.I mode > +defines mode of operation desired for this memory region. > +The following values may be bitwise ORed to set the userfaultfd mode for > +particular range: > +.RS > +.sp > +.PD 0 > +.TP 12 > +.B UFFDIO_REGISTER_MODE_MISSING > +Track page faults on missing pages > +.TP 12 > +.B UFFDIO_REGISTER_MODE_WP > +Track page faults on write protected pages. > +Currently the only supported mode is > +.I UFFDIO_REGISTER_MODE_MISSING > +.PD > +.RE > +.IP > +The kernel answers which ioctl commands are available for the requested > +range in the > +.I ioctls > +field. > +.\" > +.TP > +.BI "UFFDIO_UNREGISTER struct uffdio_register *" arg > +Unregister a memory range from userfaultfd. > +.\" > +.SS Range Ioctls > +The range ioctls enable the calling application to resolve page fault > +events in consistent way. > +.TP > +.BI "UFFDIO_COPY struct uffdio_copy *" arg > +Atomically copy a continuous memory chunk into the userfault registered > +range and optionally wake up the blocked thread. > +The source and destination addresses and the amount of bytes to copy are > +specified by > +.IR src ", " dst ", and " len > +fields of > +.I "struct uffdio_copy" > +respectively: > + > +.in +4n > +.nf > +struct uffdio_copy { > + __u64 dst; > + __u64 src; > + __u64 len; > + __u64 mode; > + __s64 copy; > +}; > +.nf > +.fi > + > +The following values may be bitwise ORed in > +.IR mode > +to change the behavior of > +.I UFFDIO_COPY > +ioctl: > +.RS > +.sp > +.PD 0 > +.TP 12 > +.B UFFDIO_COPY_MODE_DONTWAKE > +Do not wake up the thread that waits for page fault resolution > +.PD > +.RE > +.IP > +The > +.I copy > +field of the > +.I uffdio_copy > +structure is used by the kernel to return amount of bytes that was actually > +copied. > +.\" > +.TP > +.BI "UFFDIO_ZERO struct uffdio_zero *" arg > +Zero out a part of memory range registered with userfaultfd. > +The requested range is specified by > +.I range > +field of > +.I uffdio_zeropage > +structure: > + > +.in +4n > +.nf > +struct uffdio_zeropage { > + struct uffdio_range range; > + __u64 mode; > + __s64 zeropage; > +}; > +.nf > +.fi > + > +The following values may be bitwise ORed in > +.IR mode > +to change the behavior of > +.I UFFDIO_ZERO > +ioctl: > +.RS > +.sp > +.PD 0 > +.TP 12 > +.B UFFDIO_ZEROPAGE_MODE_DONTWAKE > +Do not wake up the thread that waits for page fault resolution > +.PD > +.RE > +.IP > +The > +.I zeropage > +field of the > +.I uffdio_zero > +structure is used by the kernel to return amount of bytes that was actually > +zeroed. > +.\" > +.TP > +.BI "UFFDIO_WAKE struct uffdio_range *" arg > +Wake up the thread waiting for the page fault resolution. > +.SH RETURN VALUE > +For a successful call, the > +.BR userfaultfd (2) > +system call returns the new file descriptor for the userfaultfd object. > +On error, \-1 is returned, and > +.I errno > +is set appropriately. > +.SH ERRORS > +.TP > +.B EINVAL > +An unsupported value was specified in > +.IR flags . > +.TP > +.BR EMFILE > +The per-process limit on the number of open file descriptors has been > +reached > +.TP > +.B ENFILE > +The system-wide limit on the total number of open files has been > +reached. > +.TP > +.B ENOMEM > +Insufficient kernel memory was available. > +.SH CONFORMING TO > +.BR userfaultfd () > +is Linux-specific and should not be used in programs intended to be > +portable. > +.SH NOTES > +Glibc does not provide a wrapper for this system call; call it using > +.BR syscall (2). > +.SH SEE ALSO > +.BR fcntl (2), > +.BR ioctl (2) > + > +.IR Documentation/vm/userfaultfd.txt > +in the Linux kernel source tree > + > -- > 1.9.1 > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html