Pinning user pages out of nvdimm dax memory is significantly slower compared to system ram. Analysis points to software overhead incurred from a radix tree lookup. This patch series fixes that by removing the relatively costly dev_pagemap lookup that was repeated for each page, significantly increasing gup time. The first 5 patches are just updating the benchmark to help test and demonstrate the value of the last 2 patches. The results were compared with following benchmark command for device DAX memory: # gup_benchmark -m $((12*1024)) -n 512 -L -f /dev/dax0.0 Before: 1037581 usec After: 375786 usec Not bad; the after is the same time as using baseline anonymous system RAM after this patch set, where before was nearly 3x longer. Keith Busch (7): mm/gup_benchmark: Time put_page mm/gup_benchmark: Add additional pinning methods tools/gup_benchmark: Fix 'write' flag usage tools/gup_benchmark: Allow user specified file tools/gup_benchmark: Add parameter for hugetlb mm/gup: Combine parameters into struct mm/gup: Cache dev_pagemap while pinning pages include/linux/huge_mm.h | 12 +- include/linux/hugetlb.h | 2 +- include/linux/mm.h | 27 ++- mm/gup.c | 279 ++++++++++++++--------------- mm/gup_benchmark.c | 36 +++- mm/huge_memory.c | 67 ++++--- mm/nommu.c | 6 +- tools/testing/selftests/vm/gup_benchmark.c | 40 ++++- 8 files changed, 262 insertions(+), 207 deletions(-) -- 2.14.4