Re: dma_buf support with io_uring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/23/22 07:17, Fang, Wilson wrote:
Hi Jens,

We are exploring a kernel native mechanism to support peer to peer data transfer between a NVMe SSD and another device supporting dma_buf, connected on the same PCIe root complex.
NVMe SSD DMA engine requires physical memory address and there is no easy way to pass non system memory address through VFS to the block device driver.
One of the ideas is to use the io_uring and dma_buf mechanism which is supported by the peer device of the SSD.

Interesting, that's quite aligns with what we're doing, that is a
more generic way for p2p with some non-p2p optimisations on the way.
Our approach we tried before is to let userspace to register dma-buf
fd inside io_uring as a register buffer, prepare everything in advance
like dmabuf attach, and then rw/send/etc. can use that.

The flow is as below:
1. Application passes the dma_buf fd to the kernel through liburing.
2. Io_uring adds two new options IORING_OP_READ_DMA and IORING_OP_WRITE_DMA to support read write operations that DMA to/from the peer device memory.
3. If the dma_buf fd is valid, io_uring attaches dma_buf and get sgl which contains physical memory addresses to be passed down to the block device driver.
4. NVMe SSD DMA engine DMA the data to/from the physical memory address.

The road blocker we are facing is that dma_buf_attach() and dma_buf_map_attachment() APIs expects the caller to provide the struct device *dev as input parameter pointing to the device which does the DMA (in this case the block/NVMe device that holds the source data).
But since io_uring operates at the VFS layer there is no straight forward way of finding the block/NVMe device object (struct device*) from the source file descriptor.

Do you have any recommendations? Much appreciated!

For finding a device pointer, we added an optional file operation
callback. I think that's much better than parsing it on the io_uring
side, especially since we need a guarantee that the device is the
only one which will be targeted and won't change (e.g. network may
choose a device dynamically based on target address).

I think we have space to cooperate here :)

--
Pavel Begunkov



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux