On Mon, 24 May 2021 23:27:22 +1000 Alistair Popple <apopple@xxxxxxxxxx> wrote: > Some devices require exclusive write access to shared virtual > memory (SVM) ranges to perform atomic operations on that memory. This > requires CPU page tables to be updated to deny access whilst atomic > operations are occurring. > > In order to do this introduce a new swap entry > type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for > exclusive access by a device all page table mappings for the particular > range are replaced with device exclusive swap entries. This causes any > CPU access to the page to result in a fault. > > Faults are resovled by replacing the faulting entry with the original > mapping. This results in MMU notifiers being called which a driver uses > to update access permissions such as revoking atomic access. After > notifiers have been called the device will no longer have exclusive > access to the region. > > Walking of the page tables to find the target pages is handled by > get_user_pages() rather than a direct page table walk. A direct page > table walk similar to what migrate_vma_collect()/unmap() does could also > have been utilised. However this resulted in more code similar in > functionality to what get_user_pages() provides as page faulting is > required to make the PTEs present and to break COW. > > ... > > Documentation/vm/hmm.rst | 17 ++++ > include/linux/mmu_notifier.h | 6 ++ > include/linux/rmap.h | 4 + > include/linux/swap.h | 7 +- > include/linux/swapops.h | 44 ++++++++- > mm/hmm.c | 5 + > mm/memory.c | 128 +++++++++++++++++++++++- > mm/mprotect.c | 8 ++ > mm/page_vma_mapped.c | 9 +- > mm/rmap.c | 186 +++++++++++++++++++++++++++++++++++ > 10 files changed, 405 insertions(+), 9 deletions(-) > This is quite a lot of code added to core MM for a single driver. Is there any expectation that other drivers will use this code? Is there a way of reducing the impact (code size, at least) for systems which don't need this code? How beneficial is this code to nouveau users? I see that it permits a part of OpenCL to be implemented, but how useful/important is this in the real world? Thanks.