On Tuesday, 2 March 2021 11:05:59 AM AEDT Jason Gunthorpe wrote: > On Fri, Feb 26, 2021 at 06:18:29PM +1100, Alistair Popple wrote: > > > +/** > > + * make_device_exclusive_range() - Mark a range for exclusive use by a device > > + * @mm: mm_struct of assoicated target process > > + * @start: start of the region to mark for exclusive device access > > + * @end: end address of region > > + * @pages: returns the pages which were successfully mark for exclusive acces > > + * > > + * Returns: number of pages successfully marked for exclusive access > > + * > > + * This function finds the ptes mapping page(s) to the given address range and > > + * replaces them with special swap entries preventing userspace CPU access. On > > + * fault these entries are replaced with the original mapping after calling MMU > > + * notifiers. > > + */ > > +int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, > > + unsigned long end, struct page **pages) > > +{ > > + long npages = (end - start) >> PAGE_SHIFT; > > + long i; > > + > > + npages = get_user_pages_remote(mm, start, npages, > > + FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, > > + pages, NULL, NULL); > > + for (i = 0; i < npages; i++) { > > + if (!trylock_page(pages[i])) { > > + put_page(pages[i]); > > + pages[i] = NULL; > > + continue; > > + } > > + > > + if (!try_to_protect(pages[i])) { > > Isn't this racy? get_user_pages returns the ptes at an instant in > time, they could have already been changed to something else? Right. On it's own this does not guarantee that the page is mapped at the given location, only that a mapping won't get established without an mmu notifier callback to clear the swap entry. The intent was a driver could use HMM or some other mechanism to keep PTEs synchronised if required. However I just looked at patch 8 in the series again and it appears I got this wrong when converting from the old migration approach: + mutex_unlock(&svmm->mutex); + ret = nouveau_atomic_range_fault(svmm, drm, args, + size, hmm_flags, mm); The mutex needs to be unlocked after the range fault to ensure the PTE hasn't changed. But this ends up being a problem because try_to_protect() calls notifiers which need to take that mutex and hence deadlocks. > I would think you'd want to switch to the swap entry atomically under > th PTLs? That is one approach, but the reuse of get_user_pages() to walk the page tables and fault/gather the pages is a nice simplification and adding a new FOLL flag/mode to atomically swap entries doesn't seem right. However try_to_protect() scans the PTEs again under the PTL so checking the mapping of interest actually gets replaced during the rmap walk seems like a reasonable solution. Thanks for the comments. - Alistair > Jason >