RE: dma_buf support with io_uring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, Pavel, for the recommendation!
We are super interested in collaborating on this - we are working on the prototype of your recommendation but moving a little bit slow due to vacation and resources.

Thanks,
Wilson

-----Original Message-----
From: Pavel Begunkov <asml.silence@xxxxxxxxx> 
Sent: Thursday, June 23, 2022 3:35 AM
To: Fang, Wilson <wilson.fang@xxxxxxxxx>; io-uring@xxxxxxxxxxxxxxx
Cc: Jens Axboe <axboe@xxxxxxxxx>
Subject: Re: dma_buf support with io_uring

On 6/23/22 07:17, Fang, Wilson wrote:
> Hi Jens,
> 
> We are exploring a kernel native mechanism to support peer to peer data transfer between a NVMe SSD and another device supporting dma_buf, connected on the same PCIe root complex.
> NVMe SSD DMA engine requires physical memory address and there is no easy way to pass non system memory address through VFS to the block device driver.
> One of the ideas is to use the io_uring and dma_buf mechanism which is supported by the peer device of the SSD.

Interesting, that's quite aligns with what we're doing, that is a more generic way for p2p with some non-p2p optimisations on the way.
Our approach we tried before is to let userspace to register dma-buf fd inside io_uring as a register buffer, prepare everything in advance like dmabuf attach, and then rw/send/etc. can use that.

> The flow is as below:
> 1. Application passes the dma_buf fd to the kernel through liburing.
> 2. Io_uring adds two new options IORING_OP_READ_DMA and IORING_OP_WRITE_DMA to support read write operations that DMA to/from the peer device memory.
> 3. If the dma_buf fd is valid, io_uring attaches dma_buf and get sgl which contains physical memory addresses to be passed down to the block device driver.
> 4. NVMe SSD DMA engine DMA the data to/from the physical memory address.
> 
> The road blocker we are facing is that dma_buf_attach() and dma_buf_map_attachment() APIs expects the caller to provide the struct device *dev as input parameter pointing to the device which does the DMA (in this case the block/NVMe device that holds the source data).
> But since io_uring operates at the VFS layer there is no straight forward way of finding the block/NVMe device object (struct device*) from the source file descriptor.
> 
> Do you have any recommendations? Much appreciated!

For finding a device pointer, we added an optional file operation callback. I think that's much better than parsing it on the io_uring side, especially since we need a guarantee that the device is the only one which will be targeted and won't change (e.g. network may choose a device dynamically based on target address).

I think we have space to cooperate here :)

--
Pavel Begunkov




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux