[PATCH 0/4] RFC: userfaultfd remap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Since userfaultfd remap functionality was first proposed by Andrea
Arcangeli [1], a new use case has been demonstrated for removing pages
from the userfaultfd registered region. FluidMem [2] is a system for
expanding or limiting the resident size of a VM using a remote key-value
store as backing storage instead of swap space. It runs on the hypervisor
and uses userfaultfd to manage the memory regions malloc'd by qemu.
Since FluidMem maintains a constant resident size using an LRU list, it
must evict pages to the remote key-value store to make room for pages that
were just faulted in. This requires UFFDIO_REMAP to remove pages from the
uncooperative userspace page fault handler.

The VM shadow page tables must be kept in sync after a remapping, so
mmu_notifier_invalidate_range_(start/end) calls are made as necessary.

FluiMem enables page fault latencies to a remote key-value store that are
as fast as swap backed by DRAM (/dev/pmem0) and 77% faster than swap with a
SSD drive. pmbench [3] was used to measure page fault latencies with a 4 GB
working set size, within a VM using 1 GB DRAM (20% local):

  FluidMem (RAMCloud): 24.87 microseconds
  Swap (pmem DRAM): 26.34 microseconds
  Swap (NVMe over Fabrics): 41.73 microseconds
  Swap (SSD): 106.56 microseconds

For real applications FluidMem has an additional benefit of allowing
unused kernel pages to be removed from DRAM and stored in backend storage,
making room for additional application pages to be kept in local DRAM.
The useful memory capacity for the VM is increased.

The main complexity of this code is found in rmap, where it overwrites the
page->index when it moves the page to a different vma with different
vma->vm_pgoff. Overwriting page->index requires the rmap change and it's
only possible when the page_mapcount is 1.

Changes since [1]:
 - Changed the direction supported by UFFDIO_REMAP to the OUT direction 
   needed by FluidMem. The IN direction is not necessary, as UFFDIO_COPY
   should be used instead because it doesn't require a TLB flush.
 - Code has been kept up-to-date by Andrea in branch userfault from [4].

[1] https://lkml.org/lkml/2015/3/5/576
[2] Caldwell, Blake, Youngbin Im, Sangtae Ha, Richard Han, and
    Eric Keller. "FluidMem: Memory as a Service for the Datacenter."
    arXiv preprint arXiv:1707.07780 (2017).
    https://github.com/blakecaldwell/fluidmem
[3] Yang, Jisoo, and Julian Seymour. "Pmbench: A Micro-Benchmark for
    Profiling Paging Performance on a System with Low-Latency SSDs."
    Information Technology-New Generations. Springer, Cham, 2018. 627-633.
    https://bitbucket.org/jisooy/pmbench/src
[4] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git

Andrea Arcangeli (3):
  userfaultfd: UFFDIO_REMAP: rmap preparation
  userfaultfd: UFFDIO_REMAP uABI
  userfaultfd: UFFDIO_REMAP

Blake Caldwell (1):
  userfaultfd: change the direction for UFFDIO_REMAP to out

 Documentation/admin-guide/mm/userfaultfd.rst |  10 +
 fs/userfaultfd.c                             |  49 +++
 include/linux/userfaultfd_k.h                |  17 +
 include/uapi/linux/userfaultfd.h             |  25 +-
 mm/huge_memory.c                             | 117 ++++++
 mm/khugepaged.c                              |   3 +
 mm/rmap.c                                    |  13 +
 mm/userfaultfd.c                             | 536 +++++++++++++++++++++++++++
 8 files changed, 769 insertions(+), 1 deletion(-)

-- 
1.8.3.1




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux