[-mm PATCH v4 00/18] get_user_pages() for dax pte and pmd mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Changes since v3 [1]:

1/ Minimize the impact of the modifications to get_page() by moving
   zone_device manipulations out of line and marking them unlikely().  In
   v3 a simple function like:

		get_page(page);
		do_something_with_page(page);
		put_page(page);

   ...had a text size of 672 bytes.  That is now down to 289 bytes,
   compared to the pre-patch baseline size of 267 bytes.  Disassembly shows
   that aside from conditional branch on the page zone number, data which
   should already be dcache hot, there is no icache impact in the typical
   path.  (Andrew, Dave Hansen)

2/ Minimize the impact to mm.h by moving ~200 lines of definitions to
   pfn_t.h and memremap.h.  (Andrew)

3/ Move struct vmem_altmap helper routines to the only C file that
   consumes them. (Andrew)

4/ Clean up definitions of pfn_pte, pfn_pmd, pte_devmap, and pmd_devmap
   to have proper dependencies on CONFIG_MMU and
   CONFIG_TRANSPARENT_HUGEPAGE to avoid the need to touch arch headers
   outside of x86.

5/ Skip registering 'memory block' sysfs devices for zone_device ranges
   since they are not normal memory and are not eligible to be 'onlined'.

6/ Improve the diagnostic debug messages in fs/dax.c to include
   buffer_head details.  (Willy)

These replace the following 18 patches:

    kvm-rename-pfn_t-to-kvm_pfn_t.patch..dax-re-enable-dax-pmd-mappings.patch

...in the current -mm series, the other 7 patches from v3 are
unmodified.  They have received a build success notification from the
kbuild robot over 108 configs.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-December/003370.html

---
Original summary:

To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o.  It allows userspace to coordinate
DMA/RDMA from/to persistent memory.

The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver.  The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.

The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.

Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array.  Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory.  The new
"struct vmem_altmap *"  parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.

---

Dan Williams (18):
      kvm: rename pfn_t to kvm_pfn_t
      mm, dax, pmem: introduce pfn_t
      mm: skip memory block registration for ZONE_DEVICE
      mm: introduce find_dev_pagemap()
      x86, mm: introduce vmem_altmap to augment vmemmap_populate()
      libnvdimm, pfn, pmem: allocate memmap array in persistent memory
      avr32: convert to asm-generic/memory_model.h
      hugetlb: fix compile error on tile
      frv: fix compiler warning from definition of __pmd()
      x86, mm: introduce _PAGE_DEVMAP
      mm, dax, gpu: convert vm_insert_mixed to pfn_t
      mm, dax: convert vmf_insert_pfn_pmd() to pfn_t
      libnvdimm, pmem: move request_queue allocation earlier in probe
      mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup
      mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd
      mm, x86: get_user_pages() for dax mappings
      dax: provide diagnostics for pmd mapping failures
      dax: re-enable dax pmd mappings


 arch/arm/include/asm/kvm_mmu.h          |    5 -
 arch/arm/kvm/mmu.c                      |   10 +
 arch/arm64/include/asm/kvm_mmu.h        |    3 
 arch/avr32/include/asm/page.h           |    8 +
 arch/frv/include/asm/page.h             |    2 
 arch/ia64/include/asm/page.h            |    1 
 arch/mips/include/asm/kvm_host.h        |    6 -
 arch/mips/kvm/emulate.c                 |    2 
 arch/mips/kvm/tlb.c                     |   14 +-
 arch/powerpc/include/asm/kvm_book3s.h   |    4 -
 arch/powerpc/include/asm/kvm_ppc.h      |    2 
 arch/powerpc/kvm/book3s.c               |    6 -
 arch/powerpc/kvm/book3s_32_mmu_host.c   |    2 
 arch/powerpc/kvm/book3s_64_mmu_host.c   |    2 
 arch/powerpc/kvm/e500.h                 |    2 
 arch/powerpc/kvm/e500_mmu_host.c        |    8 +
 arch/powerpc/kvm/trace_pr.h             |    2 
 arch/powerpc/sysdev/axonram.c           |    9 +
 arch/x86/include/asm/pgtable.h          |   26 +++-
 arch/x86/include/asm/pgtable_types.h    |    7 +
 arch/x86/kvm/iommu.c                    |   11 +-
 arch/x86/kvm/mmu.c                      |   37 +++--
 arch/x86/kvm/mmu_audit.c                |    2 
 arch/x86/kvm/paging_tmpl.h              |    6 -
 arch/x86/kvm/vmx.c                      |    2 
 arch/x86/kvm/x86.c                      |    2 
 arch/x86/mm/gup.c                       |   57 +++++++-
 arch/x86/mm/init_64.c                   |   33 ++++-
 arch/x86/mm/pat.c                       |    5 -
 drivers/base/memory.c                   |   13 ++
 drivers/block/brd.c                     |    7 +
 drivers/gpu/drm/exynos/exynos_drm_gem.c |    4 -
 drivers/gpu/drm/gma500/framebuffer.c    |    4 -
 drivers/gpu/drm/msm/msm_gem.c           |    4 -
 drivers/gpu/drm/omapdrm/omap_gem.c      |    7 +
 drivers/gpu/drm/ttm/ttm_bo_vm.c         |    4 -
 drivers/nvdimm/pfn_devs.c               |    3 
 drivers/nvdimm/pmem.c                   |   73 +++++++---
 drivers/s390/block/dcssblk.c            |   11 +-
 fs/Kconfig                              |    3 
 fs/dax.c                                |   76 ++++++++--
 include/asm-generic/pgtable.h           |    6 +
 include/linux/blkdev.h                  |    5 -
 include/linux/huge_mm.h                 |   15 ++
 include/linux/hugetlb.h                 |    1 
 include/linux/io.h                      |   15 --
 include/linux/kvm_host.h                |   37 +++--
 include/linux/kvm_types.h               |    2 
 include/linux/list.h                    |   12 ++
 include/linux/memory_hotplug.h          |    3 
 include/linux/memremap.h                |  114 ++++++++++++++++
 include/linux/mm.h                      |   72 ++++++++--
 include/linux/mm_types.h                |    5 +
 include/linux/pfn.h                     |    9 +
 include/linux/pfn_t.h                   |  102 ++++++++++++++
 kernel/memremap.c                       |  227 ++++++++++++++++++++++++++++++-
 lib/list_debug.c                        |    9 +
 mm/gup.c                                |   19 ++-
 mm/huge_memory.c                        |  119 ++++++++++++----
 mm/memory.c                             |   26 ++--
 mm/memory_hotplug.c                     |   67 +++++++--
 mm/mprotect.c                           |    5 -
 mm/page_alloc.c                         |   11 +-
 mm/pgtable-generic.c                    |    2 
 mm/sparse-vmemmap.c                     |   76 ++++++++++
 mm/sparse.c                             |    8 +
 mm/swap.c                               |    3 
 virt/kvm/kvm_main.c                     |   47 +++---
 68 files changed, 1204 insertions(+), 298 deletions(-)
 create mode 100644 include/linux/memremap.h
 create mode 100644 include/linux/pfn_t.h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]