From: Nadav Amit <namit@xxxxxxxxxx> While the overhead of userfaultfd is usually reasonable, this overhead can still be prohibitive for low-latency backing storage, such as RDMA, persistent memory or in-memory compression. In such cases the overhead of scheduling and entering/exiting the kernel becomes dominant. The natural solution for this problem is to use iouring with userfaultfd. But besides one bug, this does not provide sufficient performance improvement and the use of ioctls for zero/copy limits the use of iouring for synchronous "reads" (reporting of faults/events). This patch-set provides four solutions for this overhead: 1. Userfaultfd "polling" mode, in which the faulting thread polls after reporting the fault instead of being de-scheduled. This fits cases in which the handler is expected to poll for page-faults on a different thread. 2. Asynchronous-reads, in which the faulting thread reports page-faults (and other events) directly to the userspace handler thread. For this matter asynchronous read completions are being introduced. 3. Write interface, which provides similar services to the zero/copy ioctls. This allows the use of iouring for zero/copy without changing the iouring code or making it to be userfaultfd-aware. The low bits of the "position" are being used to encode the requested operation (zero/cop/wp/etc). 4. Async-writes, in which the zero/copy is performed by the faulting thread instead of the iouring thread. This reduces caching effects as the data is likely to be used by the faulting thread and find_vma() cannot use its cache on the iouring worker. I will provide some benchmark results later, but some initial results show that these patches reduce the overhead of handling a user page-fault by over 50%. The patches require a bit more cleanup but seem to pass the tests. Note that the first three patches are bug fixes. I did not Cc them to stable yet. Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Cc: Jens Axboe <axboe@xxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> Cc: io-uring@xxxxxxxxxxxxxxx Cc: linux-fsdevel@xxxxxxxxxxxxxxx Cc: linux-kernel@xxxxxxxxxxxxxxx Cc: linux-mm@xxxxxxxxx Nadav Amit (13): fs/userfaultfd: fix wrong error code on WP & !VM_MAYWRITE fs/userfaultfd: fix wrong file usage with iouring selftests/vm/userfaultfd: wake after copy failure fs/userfaultfd: simplify locks in userfaultfd_ctx_read fs/userfaultfd: introduce UFFD_FEATURE_POLL iov_iter: support atomic copy_page_from_iter_iovec() fs/userfaultfd: support read_iter to use io_uring fs/userfaultfd: complete reads asynchronously fs/userfaultfd: use iov_iter for copy/zero fs/userfaultfd: add write_iter() interface fs/userfaultfd: complete write asynchronously fs/userfaultfd: kmem-cache for wait-queue objects selftests/vm/userfaultfd: iouring and polling tests fs/userfaultfd.c | 740 ++++++++++++++++---- include/linux/hugetlb.h | 4 +- include/linux/mm.h | 6 +- include/linux/shmem_fs.h | 2 +- include/linux/uio.h | 3 + include/linux/userfaultfd_k.h | 10 +- include/uapi/linux/userfaultfd.h | 21 +- lib/iov_iter.c | 23 +- mm/hugetlb.c | 12 +- mm/memory.c | 36 +- mm/shmem.c | 17 +- mm/userfaultfd.c | 96 ++- tools/testing/selftests/vm/Makefile | 2 +- tools/testing/selftests/vm/userfaultfd.c | 835 +++++++++++++++++++++-- 14 files changed, 1506 insertions(+), 301 deletions(-) -- 2.25.1