On Wed, 7 Oct 2020 18:44:21 +0200 Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > Way back it was a reasonable assumptions that iomem mappings never > change the pfn range they point at. But this has changed: > > - gpu drivers dynamically manage their memory nowadays, invalidating > ptes with unmap_mapping_range when buffers get moved > > - contiguous dma allocations have moved from dedicated carvetouts to > cma regions. This means if we miss the unmap the pfn might contain > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > - even /dev/mem now invalidates mappings when the kernel requests that > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > ("/dev/mem: Revoke mappings when a driver claims the region") > > Accessing pfns obtained from ptes without holding all the locks is > therefore no longer a good idea. Fix this. > > Since zpci_memcpy_from|toio seems to not do anything nefarious with > locks we just need to open code get_pfn and follow_pfn and make sure > we drop the locks only after we've done. The write function also needs > the copy_from_user move, since we can't take userspace faults while > holding the mmap sem. > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > Cc: Jason Gunthorpe <jgg@xxxxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: John Hubbard <jhubbard@xxxxxxxxxx> > Cc: Jérôme Glisse <jglisse@xxxxxxxxxx> > Cc: Jan Kara <jack@xxxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: linux-mm@xxxxxxxxx > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > Cc: linux-samsung-soc@xxxxxxxxxxxxxxx > Cc: linux-media@xxxxxxxxxxxxxxx > Cc: Niklas Schnelle <schnelle@xxxxxxxxxxxxx> > Cc: Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx> > Cc: linux-s390@xxxxxxxxxxxxxxx > --- > arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++----------------- > 1 file changed, 57 insertions(+), 41 deletions(-) Looks good, thanks. Also survived some basic function test. Only some minor nitpick, see below. Reviewed-by: Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx> > > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c > index 401cf670a243..4d194cb09372 100644 > --- a/arch/s390/pci/pci_mmio.c > +++ b/arch/s390/pci/pci_mmio.c > @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst, > return rc; > } > > -static long get_pfn(unsigned long user_addr, unsigned long access, > - unsigned long *pfn) > -{ > - struct vm_area_struct *vma; > - long ret; > - > - mmap_read_lock(current->mm); > - ret = -EINVAL; > - vma = find_vma(current->mm, user_addr); > - if (!vma) > - goto out; > - ret = -EACCES; > - if (!(vma->vm_flags & access)) > - goto out; > - ret = follow_pfn(vma, user_addr, pfn); > -out: > - mmap_read_unlock(current->mm); > - return ret; > -} > - > SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, > const void __user *, user_buffer, size_t, length) > { > u8 local_buf[64]; > void __iomem *io_addr; > void *buf; > - unsigned long pfn; > + struct vm_area_struct *vma; > + pte_t *ptep; > + spinlock_t *ptl; > long ret; > > if (!zpci_is_enabled()) > @@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, > * We only support write access to MIO capable devices if we are on > * a MIO enabled system. Otherwise we would have to check for every > * address if it is a special ZPCI_ADDR and would have to do > - * a get_pfn() which we don't need for MIO capable devices. Currently > + * a pfn lookup which we don't need for MIO capable devices. Currently > * ISM devices are the only devices without MIO support and there is no > * known need for accessing these from userspace. > */ > @@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, > } else > buf = local_buf; > > - ret = get_pfn(mmio_addr, VM_WRITE, &pfn); > + ret = -EFAULT; > + if (copy_from_user(buf, user_buffer, length)) > + goto out_free; > + > + mmap_read_lock(current->mm); > + ret = -EINVAL; > + vma = find_vma(current->mm, mmio_addr); > + if (!vma) > + goto out_unlock_mmap; > + ret = -EACCES; > + if (!(vma->vm_flags & VM_WRITE)) > + goto out_unlock_mmap; > + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) > + goto out_unlock_mmap; That check for VM_IO | VM_PFNMAP was previously hidden inside follow_pfn(), and that would have returned -EINVAL in this case. With your change, we now return -EACCES. Not sure how important that is, but it feels wrong. Maybe move the VM_IO | VM_PFNMAP check up, before the ret = -EACCES? [...] > @@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr, > buf = local_buf; > } > > - ret = get_pfn(mmio_addr, VM_READ, &pfn); > + mmap_read_lock(current->mm); > + ret = -EINVAL; > + vma = find_vma(current->mm, mmio_addr); > + if (!vma) > + goto out_unlock_mmap; > + ret = -EACCES; > + if (!(vma->vm_flags & VM_WRITE)) > + goto out_unlock_mmap; > + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) > + goto out_unlock_mmap; Same here with VM_IO | VM_PFNMAP and -EINVAL.