On Wed, Oct 07, 2020 at 06:44:20PM +0200, Daniel Vetter wrote: > Way back it was a reasonable assumptions that iomem mappings never > change the pfn range they point at. But this has changed: > > - gpu drivers dynamically manage their memory nowadays, invalidating > ptes with unmap_mapping_range when buffers get moved > > - contiguous dma allocations have moved from dedicated carvetouts to > cma regions. This means if we miss the unmap the pfn might contain > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > - even /dev/mem now invalidates mappings when the kernel requests that > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > ("/dev/mem: Revoke mappings when a driver claims the region") > > Accessing pfns obtained from ptes without holding all the locks is > therefore no longer a good idea. Fix this. > > Since ioremap might need to manipulate pagetables too we need to drop > the pt lock and have a retry loop if we raced. > > While at it, also add kerneldoc and improve the comment for the > vma_ops->access function. It's for accessing, not for moving the > memory from iomem to system memory, as the old comment seemed to > suggest. > > References: 28b2ee20c7cb ("access_process_vm device memory infrastructure") > Cc: Jason Gunthorpe <jgg@xxxxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxx> > Cc: Benjamin Herrensmidt <benh@xxxxxxxxxxxxxxxxxxx> > Cc: Dave Airlie <airlied@xxxxxxxx> > Cc: Hugh Dickins <hugh@xxxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: John Hubbard <jhubbard@xxxxxxxxxx> > Cc: Jérôme Glisse <jglisse@xxxxxxxxxx> > Cc: Jan Kara <jack@xxxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: linux-mm@xxxxxxxxx > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > Cc: linux-samsung-soc@xxxxxxxxxxxxxxx > Cc: linux-media@xxxxxxxxxxxxxxx > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > --- > include/linux/mm.h | 3 ++- > mm/memory.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 44 insertions(+), 3 deletions(-) This does seem to solve the race with revoke_devmem(), but it is really ugly. It would be much nicer to wrap a rwsem around this access and the unmap. Any place using it has a nice linear translation from vm_off to pfn, so I don't think there is a such a good reason to use follow_pte in the first place. ie why not the helper be this: int generic_access_phys(unsigned long pfn, unsigned long pgprot, void *buf, size_t len, bool write) Then something like dev/mem would compute pfn and obtain the lock: dev_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write) { cpu_addr = vma->vm_pgoff*PAGE_SIZE + (addr - vma->vm_start)); /* FIXME: Has to be over each page of len */ if (!devmem_is_allowed_access(PHYS_PFN(cpu_addr/4096))) return -EPERM; down_read(&mem_sem); generic_access_phys(cpu_addr/4096, pgprot_val(vma->vm_page_prot), buf, len, write); up_read(&mem_sem); } The other cases looked simpler because they don't revoke, here the mmap_sem alone should be enough protection, they would just need to provide the linear translation to pfn. What do you think? Jason