On Thu, Mar 28, 2024 at 12:31 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Tue, Mar 26, 2024 at 01:19:20PM -0700, Mina Almasry wrote: > > > > Are you envisioning that dmabuf support would be added to the block > > layer > > Yes. > > > (which I understand is part of the VFS and not driver specific), > > The block layer isn't really the VFS, it's just another core stack > like the network stack. > > > or as part of the specific storage driver (like nvme for example)? If > > we can add dmabuf support to the block layer itself that sounds > > awesome. We may then be able to do devmem TCP on all/most storage > > devices without having to modify each individual driver. > > I suspect we'll still need to touch the drivers to understand it, > but hopefully all the main infrastructure can live in the block layer. > > > In your estimation, is adding dmabuf support to the block layer > > something technically feasible & acceptable upstream? I notice you > > suggested it so I'm guessing yes to both, but I thought I'd confirm. > > I think so, and I know there has been quite some interest to at least > pre-register userspace memory so that the iommu overhead can be > pre-loaded. It also is a much better interface for Peer to Peer > transfers than what we currently have. > I think this is positively thrilling news for me. I was worried that adding devmemTCP support to storage devices would involve using a non-dmabuf standard of buffer sharing like pci_p2pdma_ (drivers/pci/p2pdma.c) and that would require messy changes to pci_p2pdma_ that would get nacked. Also it would require adding pci_p2pdma_ support to devmem TCP, which is a can of worms. If adding dma-buf support to storage devices is feasible and desirable, that's a much better approach IMO. (a) it will maybe work with devmem TCP without any changes needed on the netdev side of things and (b) dma-buf support may be generically useful and a good contribution even outside of devmem TCP. I don't have a concrete user for devmem TCP for storage devices but the use case is very similar to GPU and I imagine the benefits in perf can be significant in some setups. Christoph, if you have any hints or rough specific design in mind for how dma-buf support can be added to the block layer, please do let us know and we'll follow your hints to investigate. But I don't want to use up too much of your time. Marc and I can definitely read enough code to figure out how to do it ourselves :-) Marc, please review and consider this thread and work, this could be a good project for you and I. I imagine the work would be: 1. Investigate how to add dma-buf support to the block layer (maybe write a prototype code, and maybe even test it with devmem TCP). 2. Share a code or no-code proposal with netdev/fs/block layer mailing list and try to work through concerns/nacks. 3. Finally share RFC through merging etc. -- Thanks, Mina