On Thu, Aug 18, 2022 at 01:07:16PM +0200, Christian König wrote: > Am 17.08.22 um 18:11 schrieb Jason Gunthorpe: > > dma-buf has become a way to safely acquire a handle to non-struct page > > memory that can still have lifetime controlled by the exporter. Notably > > RDMA can now import dma-buf FDs and build them into MRs which allows for > > PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory > > from PCI device BARs. > > > > This series supports a use case for SPDK where a NVMe device will be owned > > by SPDK through VFIO but interacting with a RDMA device. The RDMA device > > may directly access the NVMe CMB or directly manipulate the NVMe device's > > doorbell using PCI P2P. > > > > However, as a general mechanism, it can support many other scenarios with > > VFIO. I imagine this dmabuf approach to be usable by iommufd as well for > > generic and safe P2P mappings. > > In general looks good to me, but we really need to get away from using > sg_tables for this here. > > The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen > this incorrectly used so many times that I can't count them any more. > > Would that be somehow avoidable? Or could you at least explain the use case > a bit better. I didn't see a way, maybe you know of one VFIO needs to maintain a list of dmabuf FDs that have been created by the user attached to each vfio_device: int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, struct vfio_device_feature_dma_buf __user *arg, size_t argsz) { down_write(&vdev->memory_lock); list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); up_write(&vdev->memory_lock); And dmabuf FD's are removed from the list when the user closes the FD: static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) { down_write(&priv->vdev->memory_lock); list_del_init(&priv->dmabufs_elm); up_write(&priv->vdev->memory_lock); Which then poses the problem: How do you iterate over only dma_buf's that are still alive to execute move? This seems necessary as parts of the dma_buf have already been destroyed by the time the user's release function is called. Which I solved like this: down_write(&vdev->memory_lock); list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { if (!dma_buf_try_get(priv->dmabuf)) continue; So the scenarios resolve as: - Concurrent release is not in progress: dma_buf_try_get() succeeds and prevents concurrent release from starting - Release has started but not reached its memory_lock: dma_buf_try_get() fails - Release has started but passed its memory_lock: dmabuf is not on the list so dma_buf_try_get() is not called. Jason