Andrea Arcangeli <aarcange <at> redhat.com> writes: > > Once an userfaultfd is created MADV_USERFAULT regions talks through > the userfaultfd protocol with the thread responsible for doing the > memory externalization of the process. > > The protocol starts by userland writing the requested/preferred > USERFAULT_PROTOCOL version into the userfault fd (64bit write), if > kernel knows it, it will ack it by allowing userland to read 64bit > from the userfault fd that will contain the same 64bit > USERFAULT_PROTOCOL version that userland asked. Otherwise userfault > will read __u64 value -1ULL (aka USERFAULTFD_UNKNOWN_PROTOCOL) and it > will have to try again by writing an older protocol version if > suitable for its usage too, and read it back again until it stops > reading -1ULL. After that the userfaultfd protocol starts. > > The protocol consists in the userfault fd reads 64bit in size > providing userland the fault addresses. After a userfault address has > been read and the fault is resolved by userland, the application must > write back 128bits in the form of [ start, end ] range (64bit each) > that will tell the kernel such a range has been mapped. Multiple read > userfaults can be resolved in a single range write. poll() can be used > to know when there are new userfaults to read (POLLIN) and when there > are threads waiting a wakeup through a range write (POLLOUT). > > Signed-off-by: Andrea Arcangeli <aarcange <at> redhat.com> > --- > arch/x86/syscalls/syscall_32.tbl | 1 + > arch/x86/syscalls/syscall_64.tbl | 1 + > fs/Makefile | 1 + > fs/userfaultfd.c | 643 +++++++++++++++++++++++++++++++++++++++ > include/linux/syscalls.h | 1 + > include/linux/userfaultfd.h | 42 +++ > init/Kconfig | 11 + > kernel/sys_ni.c | 1 + > mm/huge_memory.c | 24 +- > mm/memory.c | 5 +- > 10 files changed, 720 insertions(+), 10 deletions(-) > create mode 100644 fs/userfaultfd.c > create mode 100644 include/linux/userfaultfd.h > Hello, I am wondering if, instead of a new syscall, a suitable fd could be obtained by opening a special file (say /dev/userfault, analogous to /dev/shm). This has the added bonus that system admins can tweak access to this feature via normal file permissions. And if the file doesn't exist then the kernel has simply no support for it. I was wondering the same for memfd() when it was added to the kernel but this time I decided to actually ask :) Best regards -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>