Hi Alex: > -----邮件原件----- > 发件人: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > 发送时间: 2021年3月10日 14:09 > 收件人: Jason Gunthorpe <jgg@xxxxxxxxxx> > 抄送: Peter Xu <peterx@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; > Cornelia Huck <cohuck@xxxxxxxxxx>; Kevin Tian <kevin.tian@xxxxxxxxx>; > Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>; Giovanni Cabiddu > <giovanni.cabiddu@xxxxxxxxx>; Michel Lespinasse <walken@xxxxxxxxxx>; Jann > Horn <jannh@xxxxxxxxxx>; Max Gurtovoy <mgurtovoy@xxxxxxxxxx>; > kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Linuxarm > <linuxarm@xxxxxxxxxx> > 主题: Re: [PATCH] vfio/pci: make the vfio_pci_mmap_fault reentrant > > On Tue, 9 Mar 2021 19:41:27 -0400 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > On Tue, Mar 09, 2021 at 12:26:07PM -0700, Alex Williamson wrote: > > > > > In the new series, I think the fault handler becomes (untested): > > > > > > static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) { > > > struct vm_area_struct *vma = vmf->vma; > > > struct vfio_pci_device *vdev = vma->vm_private_data; > > > unsigned long base_pfn, pgoff; > > > vm_fault_t ret = VM_FAULT_SIGBUS; > > > > > > if (vfio_pci_bar_vma_to_pfn(vma, &base_pfn)) > > > return ret; > > > > > > pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT; > > > > I don't think this math is completely safe, it needs to parse the > > vm_pgoff.. > > > > I'm worried userspace could split/punch/mangle a VMA using > > munmap/mremap/etc/etc in a way that does update the pg_off but is > > incompatible with the above. > > parsing vm_pgoff is done in: > > static int vfio_pci_bar_vma_to_pfn(struct vm_area_struct *vma, > unsigned long *pfn) { > struct vfio_pci_device *vdev = vma->vm_private_data; > struct pci_dev *pdev = vdev->pdev; > int index; > u64 pgoff; > > index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); > > if (index >= VFIO_PCI_ROM_REGION_INDEX || > !vdev->bar_mmap_supported[index] || !vdev->barmap[index]) > return -EINVAL; > > pgoff = vma->vm_pgoff & > ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); > > *pfn = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; > > return 0; > } > > But given Peter's concern about faulting individual pages, I think the fault handler > becomes: > > static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) { > struct vm_area_struct *vma = vmf->vma; > struct vfio_pci_device *vdev = vma->vm_private_data; > unsigned long vaddr, pfn; > vm_fault_t ret = VM_FAULT_SIGBUS; > > if (vfio_pci_bar_vma_to_pfn(vma, &pfn)) > return ret; > > down_read(&vdev->memory_lock); > > if (__vfio_pci_memory_enabled(vdev)) { > for (vaddr = vma->vm_start; > vaddr < vma->vm_end; vaddr += PAGE_SIZE, pfn++) { One concern here is the performance, since you are doing the mapping for the whole vma, what about using block mapping if applicable? > ret = vmf_insert_pfn_prot(vma, vaddr, pfn, > > pgprot_decrypted(vma->vm_page_prot)); > if (ret != VM_FAULT_NOPAGE) { > zap_vma_ptes(vma, vma->vm_start, > vaddr - vma->vm_start); > break; > } > } > } > > up_read(&vdev->memory_lock); > > return ret; > } > > Thanks, > Alex