Am 18.08.22 um 14:03 schrieb Jason Gunthorpe:
On Thu, Aug 18, 2022 at 01:07:16PM +0200, Christian König wrote:
Am 17.08.22 um 18:11 schrieb Jason Gunthorpe:
dma-buf has become a way to safely acquire a handle to non-struct page
memory that can still have lifetime controlled by the exporter. Notably
RDMA can now import dma-buf FDs and build them into MRs which allows for
PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
from PCI device BARs.
This series supports a use case for SPDK where a NVMe device will be owned
by SPDK through VFIO but interacting with a RDMA device. The RDMA device
may directly access the NVMe CMB or directly manipulate the NVMe device's
doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
generic and safe P2P mappings.
In general looks good to me, but we really need to get away from using
sg_tables for this here.
The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
this incorrectly used so many times that I can't count them any more.
Would that be somehow avoidable? Or could you at least explain the use case
a bit better.
I didn't see a way, maybe you know of one
For GEM objects we usually don't use the reference count of the DMA-buf,
but rather that of the GEM object for this. But that's not an ideal
solution either.
VFIO needs to maintain a list of dmabuf FDs that have been created by
the user attached to each vfio_device:
int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
struct vfio_device_feature_dma_buf __user *arg,
size_t argsz)
{
down_write(&vdev->memory_lock);
list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
up_write(&vdev->memory_lock);
And dmabuf FD's are removed from the list when the user closes the FD:
static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
{
down_write(&priv->vdev->memory_lock);
list_del_init(&priv->dmabufs_elm);
up_write(&priv->vdev->memory_lock);
Which then poses the problem: How do you iterate over only dma_buf's
that are still alive to execute move?
This seems necessary as parts of the dma_buf have already been
destroyed by the time the user's release function is called.
Which I solved like this:
down_write(&vdev->memory_lock);
list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
if (!dma_buf_try_get(priv->dmabuf))
continue;
What would happen if you don't skip destroyed dma-bufs here? In other
words why do you maintain that list in the first place?
Regards,
Christian.
So the scenarios resolve as:
- Concurrent release is not in progress: dma_buf_try_get() succeeds
and prevents concurrent release from starting
- Release has started but not reached its memory_lock:
dma_buf_try_get() fails
- Release has started but passed its memory_lock: dmabuf is not on
the list so dma_buf_try_get() is not called.
Jason