The patch titled Subject: mm/rmap: convert make_device_exclusive_range() to make_device_exclusive() has been added to the -mm mm-unstable branch. Its filename is mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: David Hildenbrand <david@xxxxxxxxxx> Subject: mm/rmap: convert make_device_exclusive_range() to make_device_exclusive() Date: Mon, 10 Feb 2025 20:37:45 +0100 The single "real" user in the tree of make_device_exclusive_range() always requests making only a single address exclusive. The current implementation is hard to fix for properly supporting anonymous THP / large folios and for avoiding messing with rmap walks in weird ways. So let's always process a single address/page and return folio + page to minimize page -> folio lookups. This is a preparation for further changes. Reject any non-anonymous or hugetlb folios early, directly after GUP. While at it, extend the documentation of make_device_exclusive() to clarify some things. Link: https://lkml.kernel.org/r/20250210193801.781278-4-david@xxxxxxxxxx Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> Acked-by: Simona Vetter <simona.vetter@xxxxxxxx> Reviewed-by: Alistair Popple <apopple@xxxxxxxxxx> Cc: Alex Shi <alexs@xxxxxxxxxx> Cc: Danilo Krummrich <dakr@xxxxxxxxxx> Cc: Dave Airlie <airlied@xxxxxxxxx> Cc: Jann Horn <jannh@xxxxxxxxxx> Cc: Jason Gunthorpe <jgg@xxxxxxxxxx> Cc: Jerome Glisse <jglisse@xxxxxxxxxx> Cc: John Hubbard <jhubbard@xxxxxxxxxx> Cc: Jonathan Corbet <corbet@xxxxxxx> Cc: Karol Herbst <kherbst@xxxxxxxxxx> Cc: Liam Howlett <liam.howlett@xxxxxxxxxx> Cc: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> Cc: Lyude <lyude@xxxxxxxxxx> Cc: "Masami Hiramatsu (Google)" <mhiramat@xxxxxxxxxx> Cc: Oleg Nesterov <oleg@xxxxxxxxxx> Cc: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Cc: SeongJae Park <sj@xxxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Yanteng Si <si.yanteng@xxxxxxxxx> Cc: Barry Song <v-songbaohua@xxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- Documentation/mm/hmm.rst | 2 Documentation/translations/zh_CN/mm/hmm.rst | 2 drivers/gpu/drm/nouveau/nouveau_svm.c | 5 include/linux/mmu_notifier.h | 2 include/linux/rmap.h | 5 lib/test_hmm.c | 43 ++----- mm/rmap.c | 105 ++++++++++-------- 7 files changed, 85 insertions(+), 79 deletions(-) --- a/Documentation/mm/hmm.rst~mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive +++ a/Documentation/mm/hmm.rst @@ -400,7 +400,7 @@ Exclusive access memory Some devices have features such as atomic PTE bits that can be used to implement atomic access to system memory. To support atomic operations to a shared virtual memory page such a device needs access to that page which is exclusive of any -userspace access from the CPU. The ``make_device_exclusive_range()`` function +userspace access from the CPU. The ``make_device_exclusive()`` function can be used to make a memory range inaccessible from userspace. This replaces all mappings for pages in the given range with special swap --- a/Documentation/translations/zh_CN/mm/hmm.rst~mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive +++ a/Documentation/translations/zh_CN/mm/hmm.rst @@ -326,7 +326,7 @@ ç?¬å? 访é?®å?å?¨å?¨ ä¸?äº?设å¤?å?·æ??诸å¦?å??å?PTEä½?ç??å??è?½ï¼?å?¯ä»¥ç?¨æ?¥å®?ç?°å¯¹ç³»ç»?å??å?ç??å??å?访é?®ã??为äº?æ?¯æ??对ä¸? 个å?±äº«ç??è??æ??å??å?页ç??å??å?æ??ä½?ï¼?è¿?æ ·ç??设å¤?é??è¦?对该页ç??访é?®æ?¯æ??ä»?ç??ï¼?è??ä¸?æ?¯æ?¥è?ªCPU -ç??ä»»ä½?ç?¨æ?·ç©ºé?´è®¿é?®ã?? ``make_device_exclusive_range()`` å?½æ?°å?¯ä»¥ç?¨æ?¥ä½¿ä¸? +ç??ä»»ä½?ç?¨æ?·ç©ºé?´è®¿é?®ã?? ``make_device_exclusive()`` å?½æ?°å?¯ä»¥ç?¨æ?¥ä½¿ä¸? 个å??å?è??å?´ä¸?è?½ä»?ç?¨æ?·ç©ºé?´è®¿é?®ã?? è¿?å°?ç?¨ç?¹æ®?ç??交æ?¢æ?¡ç?®æ?¿æ?¢ç»?å®?è??å?´å??ç??æ??æ??页ç??æ? å°?ã??ä»»ä½?è¯?å?¾è®¿é?®äº¤æ?¢æ?¡ç?®ç??è¡?为é?½ä¼? --- a/drivers/gpu/drm/nouveau/nouveau_svm.c~mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive +++ a/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -609,10 +609,9 @@ static int nouveau_atomic_range_fault(st notifier_seq = mmu_interval_read_begin(¬ifier->notifier); mmap_read_lock(mm); - ret = make_device_exclusive_range(mm, start, start + PAGE_SIZE, - &page, drm->dev); + page = make_device_exclusive(mm, start, drm->dev, &folio); mmap_read_unlock(mm); - if (ret <= 0 || !page) { + if (IS_ERR(page)) { ret = -EINVAL; goto out; } --- a/include/linux/mmu_notifier.h~mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive +++ a/include/linux/mmu_notifier.h @@ -46,7 +46,7 @@ struct mmu_interval_notifier; * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no * longer have exclusive access to the page. When sent during creation of an * exclusive range the owner will be initialised to the value provided by the - * caller of make_device_exclusive_range(), otherwise the owner will be NULL. + * caller of make_device_exclusive(), otherwise the owner will be NULL. */ enum mmu_notifier_event { MMU_NOTIFY_UNMAP = 0, --- a/include/linux/rmap.h~mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive +++ a/include/linux/rmap.h @@ -663,9 +663,8 @@ int folio_referenced(struct folio *, int void try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *arg); +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, + void *owner, struct folio **foliop); /* Avoid racy checks */ #define PVMW_SYNC (1 << 0) --- a/lib/test_hmm.c~mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive +++ a/lib/test_hmm.c @@ -780,10 +780,8 @@ static int dmirror_exclusive(struct dmir unsigned long start, end, addr; unsigned long size = cmd->npages << PAGE_SHIFT; struct mm_struct *mm = dmirror->notifier.mm; - struct page *pages[64]; struct dmirror_bounce bounce; - unsigned long next; - int ret; + int ret = 0; start = cmd->addr; end = start + size; @@ -795,36 +793,27 @@ static int dmirror_exclusive(struct dmir return -EINVAL; mmap_read_lock(mm); - for (addr = start; addr < end; addr = next) { - unsigned long mapped = 0; - int i; - - next = min(end, addr + (ARRAY_SIZE(pages) << PAGE_SHIFT)); - - ret = make_device_exclusive_range(mm, addr, next, pages, NULL); - /* - * Do dmirror_atomic_map() iff all pages are marked for - * exclusive access to avoid accessing uninitialized - * fields of pages. - */ - if (ret == (next - addr) >> PAGE_SHIFT) - mapped = dmirror_atomic_map(addr, next, pages, dmirror); - for (i = 0; i < ret; i++) { - if (pages[i]) { - unlock_page(pages[i]); - put_page(pages[i]); - } + for (addr = start; !ret && addr < end; addr += PAGE_SIZE) { + struct folio *folio; + struct page *page; + + page = make_device_exclusive(mm, addr, NULL, &folio); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + break; } - if (addr + (mapped << PAGE_SHIFT) < next) { - mmap_read_unlock(mm); - mmput(mm); - return -EBUSY; - } + ret = dmirror_atomic_map(addr, addr + PAGE_SIZE, &page, dmirror); + ret = ret == 1 ? 0 : -EBUSY; + folio_unlock(folio); + folio_put(folio); } mmap_read_unlock(mm); mmput(mm); + if (ret) + return ret; + /* Return the migrated data for verification. */ ret = dmirror_bounce_init(&bounce, start, size); if (ret) --- a/mm/rmap.c~mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive +++ a/mm/rmap.c @@ -2495,70 +2495,89 @@ static bool folio_make_device_exclusive( .arg = &args, }; - /* - * Restrict to anonymous folios for now to avoid potential writeback - * issues. - */ - if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) - return false; - rmap_walk(folio, &rwc); return args.valid && !folio_mapcount(folio); } /** - * make_device_exclusive_range() - Mark a range for exclusive use by a device + * make_device_exclusive() - Mark a page for exclusive use by a device * @mm: mm_struct of associated target process - * @start: start of the region to mark for exclusive device access - * @end: end address of region - * @pages: returns the pages which were successfully marked for exclusive access + * @addr: the virtual address to mark for exclusive device access * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier to allow filtering + * @foliop: folio pointer will be stored here on success. * - * Returns: number of pages found in the range by GUP. A page is marked for - * exclusive access only if the page pointer is non-NULL. + * This function looks up the page mapped at the given address, grabs a + * folio reference, locks the folio and replaces the PTE with special + * device-exclusive PFN swap entry, preventing access through the process + * page tables. The function will return with the folio locked and referenced. * - * This function finds ptes mapping page(s) to the given address range, locks - * them and replaces mappings with special swap entries preventing userspace CPU - * access. On fault these entries are replaced with the original mapping after - * calling MMU notifiers. + * On fault, the device-exclusive entries are replaced with the original PTE + * under folio lock, after calling MMU notifiers. + * + * Only anonymous non-hugetlb folios are supported and the VMA must have + * write permissions such that we can fault in the anonymous page writable + * in order to mark it exclusive. The caller must hold the mmap_lock in read + * mode. * * A driver using this to program access from a device must use a mmu notifier * critical section to hold a device specific lock during programming. Once - * programming is complete it should drop the page lock and reference after + * programming is complete it should drop the folio lock and reference after * which point CPU access to the page will revoke the exclusive access. + * + * Notes: + * #. This function always operates on individual PTEs mapping individual + * pages. PMD-sized THPs are first remapped to be mapped by PTEs before + * the conversion happens on a single PTE corresponding to @addr. + * #. While concurrent access through the process page tables is prevented, + * concurrent access through other page references (e.g., earlier GUP + * invocation) is not handled and not supported. + * #. device-exclusive entries are considered "clean" and "old" by core-mm. + * Device drivers must update the folio state when informed by MMU + * notifiers. + * + * Returns: pointer to mapped page on success, otherwise a negative error. */ -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *owner) +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, + void *owner, struct folio **foliop) { - long npages = (end - start) >> PAGE_SHIFT; - long i; + struct folio *folio; + struct page *page; + long npages; - npages = get_user_pages_remote(mm, start, npages, + mmap_assert_locked(mm); + + /* + * Fault in the page writable and try to lock it; note that if the + * address would already be marked for exclusive use by a device, + * the GUP call would undo that first by triggering a fault. + */ + npages = get_user_pages_remote(mm, addr, 1, FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, - pages, NULL); - if (npages < 0) - return npages; - - for (i = 0; i < npages; i++, start += PAGE_SIZE) { - struct folio *folio = page_folio(pages[i]); - if (PageTail(pages[i]) || !folio_trylock(folio)) { - folio_put(folio); - pages[i] = NULL; - continue; - } - - if (!folio_make_device_exclusive(folio, mm, start, owner)) { - folio_unlock(folio); - folio_put(folio); - pages[i] = NULL; - } + &page, NULL); + if (npages != 1) + return ERR_PTR(npages); + folio = page_folio(page); + + if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { + folio_put(folio); + return ERR_PTR(-EOPNOTSUPP); } - return npages; + if (!folio_trylock(folio)) { + folio_put(folio); + return ERR_PTR(-EBUSY); + } + + if (!folio_make_device_exclusive(folio, mm, addr, owner)) { + folio_unlock(folio); + folio_put(folio); + return ERR_PTR(-EBUSY); + } + *foliop = folio; + return page; } -EXPORT_SYMBOL_GPL(make_device_exclusive_range); +EXPORT_SYMBOL_GPL(make_device_exclusive); #endif void __put_anon_vma(struct anon_vma *anon_vma) _ Patches currently in -mm which might be from david@xxxxxxxxxx are mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive.patch mm-rmap-implement-make_device_exclusive-using-folio_walk-instead-of-rmap-walk.patch mm-memory-detect-writability-in-restore_exclusive_pte-through-can_change_pte_writable.patch mm-use-single-swp_device_exclusive-entry-type.patch mm-page_vma_mapped-device-exclusive-entries-are-not-migration-entries.patch kernel-events-uprobes-handle-device-exclusive-entries-correctly-in-__replace_page.patch mm-ksm-handle-device-exclusive-entries-correctly-in-write_protect_page.patch mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_unmap_one.patch mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_migrate_one.patch mm-rmap-handle-device-exclusive-entries-correctly-in-page_vma_mkclean_one.patch mm-page_idle-handle-device-exclusive-entries-correctly-in-page_idle_clear_pte_refs_one.patch mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_young_one.patch mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_mkold_one.patch mm-rmap-keep-mapcount-untouched-for-device-exclusive-entries.patch mm-rmap-avoid-ebusy-from-make_device_exclusive.patch