For now, we did not support reliable R/O long-term pinning in COW mappings. That means, if we would trigger R/O long-term pinning in MAP_PRIVATE mapping, we could end up pinning the (R/O-mapped) shared zeropage or a pagecache page. The next write access would trigger a write fault and replace the pinned page by an exclusive anonymous page in the process page table; whatever the process would write to that private page copy would not be visible by the owner of the previous page pin: for example, RDMA could read stale data. The end result is essentially an unexpected and hard-to-debug memory corruption. Some drivers tried working around that limitation by using "FOLL_FORCE|FOLL_WRITE|FOLL_LONGTERM" for R/O long-term pinning for now. FOLL_WRITE would trigger a write fault, if required, and break COW before pinning the page. FOLL_FORCE is required because the VMA might lack write permissions, and drivers wanted to make that working as well, just like one would expect (no write access, but still triggering a write access to break COW). However, that is not a practical solution, because (1) Drivers that don't stick to that undocumented and debatable pattern would still run into that issue. For example, VFIO only uses FOLL_LONGTERM for R/O long-term pinning. (2) Using FOLL_WRITE just to work around a COW mapping + page pinning limitation is unintuitive. FOLL_WRITE would, for example, mark the page softdirty or trigger uffd-wp, even though, there actually isn't going to be any write access. (3) The purpose of FOLL_FORCE is debug access, not access without lack of VMA permissions by arbitrarty drivers. So instead, make R/O long-term pinning work as expected, by breaking COW in a COW mapping early, such that we can remove any FOLL_FORCE usage from drivers. More details in patch #8. Patches #1--#3 add COW tests for non-anonymous pages. Patches #4--#7 prepare core MM for extended FAULT_FLAG_UNSHARE support in COW mappings. Patch #8 implements reliable R/O long-term pinning in COW mappings Patches #9--#19 remove any FOLL_FORCE usage from drivers. I'm refraining from CCing all driver maintainers on the whole patch set. Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Jason Gunthorpe <jgg@xxxxxxxx> Cc: John Hubbard <jhubbard@xxxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Nadav Amit <namit@xxxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Cc: Muchun Song <songmuchun@xxxxxxxxxxxxx> Cc: Shuah Khan <shuah@xxxxxxxxxx Cc: Lucas Stach <l.stach@xxxxxxxxxxxxxx> Cc: David Airlie <airlied@xxxxxxxxx> Cc: Oded Gabbay <ogabbay@xxxxxxxxxx> Cc: Arnd Bergmann <arnd@xxxxxxxx> David Hildenbrand (19): selftests/vm: anon_cow: prepare for non-anonymous COW tests selftests/vm: cow: basic COW tests for non-anonymous pages selftests/vm: cow: R/O long-term pinning reliability tests for non-anon pages mm: add early FAULT_FLAG_UNSHARE consistency checks mm: add early FAULT_FLAG_WRITE consistency checks mm: rework handling in do_wp_page() based on private vs. shared mappings mm: don't call vm_ops->huge_fault() in wp_huge_pmd()/wp_huge_pud() for private mappings mm: extend FAULT_FLAG_UNSHARE support to anything in a COW mapping mm/gup: reliable R/O long-term pinning in COW mappings RDMA/umem: remove FOLL_FORCE usage RDMA/usnic: remove FOLL_FORCE usage RDMA/siw: remove FOLL_FORCE usage media: videobuf-dma-sg: remove FOLL_FORCE usage drm/etnaviv: remove FOLL_FORCE usage media: pci/ivtv: remove FOLL_FORCE usage mm/frame-vector: remove FOLL_FORCE usage drm/exynos: remove FOLL_FORCE usage RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usage habanalabs: remove FOLL_FORCE usage drivers/gpu/drm/etnaviv/etnaviv_gem.c | 8 +- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +- drivers/infiniband/core/umem.c | 8 +- drivers/infiniband/hw/qib/qib_user_pages.c | 2 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +- drivers/infiniband/sw/siw/siw_mem.c | 9 +- drivers/media/common/videobuf2/frame_vector.c | 2 +- drivers/media/pci/ivtv/ivtv-udma.c | 2 +- drivers/media/pci/ivtv/ivtv-yuv.c | 5 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 14 +- drivers/misc/habanalabs/common/memory.c | 3 +- include/linux/mm.h | 27 +- include/linux/mm_types.h | 8 +- mm/gup.c | 10 +- mm/huge_memory.c | 5 +- mm/hugetlb.c | 12 +- mm/memory.c | 97 +++-- tools/testing/selftests/vm/.gitignore | 2 +- tools/testing/selftests/vm/Makefile | 10 +- tools/testing/selftests/vm/check_config.sh | 4 +- .../selftests/vm/{anon_cow.c => cow.c} | 387 +++++++++++++++++- tools/testing/selftests/vm/run_vmtests.sh | 8 +- 22 files changed, 516 insertions(+), 118 deletions(-) rename tools/testing/selftests/vm/{anon_cow.c => cow.c} (74%) -- 2.38.1