On Fri, Oct 18, 2019 at 08:01:44PM +0800, zhangfei.gao@xxxxxxxxxxx wrote: > > More generally, it would be nice to use the DMA API when SVA isn't > > supported, instead of manually allocating and mapping memory with > > iommu_map(). Do we only handcraft these functions in order to have VA == > > IOVA? On its own it doesn't seem like a strong enough reason to avoid the > > DMA API. > Here we use unmanaged domain to prevent va conflict with iova. > The target is still to build shared virtual address though SVA is not > supported. If SVA isn't supported, having VA == IOVA looks nice but isn't particularly useful. We could instead require that, if SVA isn't supported, userspace handles VA and IOVA separately for any DMA region. Enforcing VA == IOVA adds some unnecessary complexity to this module. In addition to the special case for software MSIs that is already there (uacce_iommu_has_sw_msi), it's also not guaranteed that the whole VA space is representable with IOVAs, you might need to poke holes in the IOVA space for reserved regions (See iommu.*resv). For example VFIO checks that the IOVA requested by userspace doesn't fall into a reserved range (see iova_list in vfio_iommu_type1.c). It also exports to userspace a list of possible IOVAs through VFIO_IOMMU_GET_INFO. Letting the DMA API allocate addresses would be simpler, since it already deals with resv regions and software MSI. > The iova from dma api can be same with va, and device can not distinguish > them. > So here we borrow va from user space and iommu_map to device, and the va > becomes iova. > Since this iova is from user space, so no conflict. > Then dma api can not be used in this case. > > drivers/vfio/vfio_iommu_type1.c also use iommu_domain_alloc. VFIO needs to let userspace pick its IOVA, because the IOVA space is generally managed by a guest OS. In my opinion this is a baggage that uacce doesn't need. If we only supported the DMA API and not unmanaged IOMMU domains, userspace would need to do a little bit more work by differentiating between VA and DMA addresses, but that could be abstracted into the uacce library and it would make the kernel module a lot simpler. [...] > > I wish the SVA and !SVA paths were less interleaved. Both models are > > fundamentally different: > > > > * Without SVA you cannot share the device between multiple processes. All > > DMA mappings are in the "main", non-PASID address space of the device. > > > > Note that process isolation without SVA could be achieved with the > > auxiliary domains IOMMU API (introduced primarily for vfio-mdev) but > > this is not the model chosen here. > Does pasid has to be supported for this case? Yes, you do need PASID support for auxiliary domains, but not PRI/Stall. [...] > > > + /* allocate memory */ > > > + if (flags & UACCE_QFRF_DMA) { > > At the moment UACCE_QFRF_DMA is never set, so there is a lot of unused and > > possibly untested code in this file. I think it would be simpler to choose > > between either DMA API or unmanaged IOMMU domains and stick with it. As > > said before, I'd prefer DMA API. > UACCE_QFRF_DMA is using dma api, it used this for quick method, though it > can not prevent va conflict. > We use an ioctl to get iova of the dma buffer. > Since the interface is not standard, we kept the interface and verified > internally. As above, it's probably worth exploring this method further for !SVA. > > > + qfr->kaddr = dma_alloc_coherent(uacce->pdev, > > > + qfr->nr_pages << PAGE_SHIFT, > > > + &qfr->dma, GFP_KERNEL); > > > + if (!qfr->kaddr) { > > > + ret = -ENOMEM; > > > + goto err_with_qfr; > > > + } > > > + } else { > > > + ret = uacce_qfr_alloc_pages(qfr); > > > + if (ret) > > > + goto err_with_qfr; > > > + } > > > + > > > + /* map to device */ > > > + ret = uacce_queue_map_qfr(q, qfr); > > Worth moving into the else above. > The idea here is a, map to device, b, map to user space. Yes but dma_alloc_coherent() creates the IOMMU mapping, and uacce_queue_map_qfr()'s only task is to create the IOMMU mapping when the DMA API isn't in use, so you could move this call up, right after uacce_qfr_alloc_pages(). [...] > > > + q->state = UACCE_Q_ZOMBIE; > > Since the PUT_Q ioctl makes the queue unrecoverable, why should userspace > > invoke it instead of immediately calling close()? > We found close does not release resource immediately, which may cause issue > when re-open again > when all queues are used. I think the only way to fix that problem is to avoid reallocating the resources until they are released, because we can't count on userspace to always call the PUT_Q ioctl. Sometimes the program will crash before that. > > > +static int uacce_fops_mmap(struct file *filep, struct vm_area_struct *vma) > > > +{ > > > + struct uacce_queue *q = filep->private_data; > > > + struct uacce_device *uacce = q->uacce; > > > + struct uacce_qfile_region *qfr; > > > + enum uacce_qfrt type = 0; > > > + unsigned int flags = 0; > > > + int ret; > > > + > > > + if (vma->vm_pgoff < UACCE_QFRT_MAX) > > > + type = vma->vm_pgoff; > > > + > > > + vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND; > > > + > > > + mutex_lock(&uacce_mutex); By the way, lockdep detects a possible unsafe locking scenario here, because we're taking the uacce_mutex even though mmap called us with the mmap_sem held for writing. Conversely uacce_fops_release() takes the mmap_sem for writing while holding the uacce_mutex. I think it can be fixed easily, if we simply remove the use of mmap_sem in uacce_fops_release(), since it's only taken to do some accounting which doesn't look right. However, a similar but more complex locking issue comes from the current use of iommu_sva_bind/unbind_device(): uacce_fops_open: iommu_sva_unbind_device() iommu_sva_bind_group() [iommu_group->mutex] mmu_notifier_get() [mmap_sem] uacce_fops_mmap: [mmap_sem] [uacce_mutex] uacce_fops_release: [uacce_mutex] iommu_sva_unbind_device() [iommu_group->mutex] This circular dependency can be broken by calling iommu_sva_unbind_device() outside of uacce_mutex, but I think it's worth reworking the queue locking scheme a little and use fine-grained locking for the queue state. Something else I noticed is uacce_idr isn't currently protected. The IDR API expected the caller to use its own locking scheme. You could replace it with an xarray, which I think is preferred to IDR now and provides a xa_lock. Thanks, Jean