We have a 3x throughput improvement reported by Intel's kernel test robot: https://lore.kernel.org/all/202404261055.c5e24608-oliver.sang@xxxxxxxxx/ This is from delaying taking the mmap_lock for page faults until we actually need the mmap_lock in order to assign an anon_vma to the vma. It cleans up the page fault path a little by making the anon fault handler more similar to the file fault handler. Matthew Wilcox (Oracle) (4): mm: Assert the mmap_lock is held in __anon_vma_prepare() mm: Delay the check for a NULL anon_vma mm: Fix some minor per-VMA lock issues in userfaultfd mm: Optimise vmf_anon_prepare() for VMAs without an anon_vma mm/huge_memory.c | 6 ++++-- mm/memory.c | 42 +++++++++++++++++++++++++++--------------- mm/rmap.c | 3 +-- mm/userfaultfd.c | 20 +++++++++----------- 4 files changed, 41 insertions(+), 30 deletions(-) -- 2.43.0