On 2020-03-12 8:19 a.m., Jason Gunthorpe wrote: > On Thu, Mar 12, 2020 at 03:47:29AM -0700, Christoph Hellwig wrote: >> On Thu, Mar 12, 2020 at 11:31:35AM +0100, Christian König wrote: >>> But how should we then deal with all the existing interfaces which already >>> take a scatterlist/sg_table ? >>> >>> The whole DMA-buf design and a lot of drivers are build around >>> scatterlist/sg_table and to me that actually makes quite a lot of sense. >>> >> >> Replace them with a saner interface that doesn't take a scatterlist. >> At very least for new functionality like peer to peer DMA, but >> especially this code would also benefit from a general move away >> from the scatterlist. > > If dma buf can do P2P I'd like to see support for consuming a dmabuf > in RDMA. Looking at how.. there is an existing sgl based path starting > from get_user_pages through dma map to the drivers. (ib_umem) > > I can replace the driver part with something else (dma_sg), but not > until we get a way to DMA map pages directly into that something > else.. > > The non-page scatterlist is also a big concern for RDMA as we have > drivers that want the page list, so even if we did as this series > contemplates I'd have still have to split the drivers and create the > notion of a dma-only SGL. > >>> I mean we could come up with a new structure for this, but to me that just >>> looks like reinventing the wheel. Especially since drivers need to be able >>> to handle both I/O to system memory and I/O to PCIe BARs. >> >> The structure for holding the struct page side of the scatterlist is >> called struct bio_vec, so far mostly used by the block and networking >> code. > > I haven't used bio_vecs before, do they support chaining like SGL so > they can be very big? RDMA dma maps gigabytes of memory bio_vec's themselves don't support chaining... In the block layer they are used in a struct bio which handles chaining, splitting and other features. Each bio, though, has a limit of 256 segments to avoid higher order allocations. Depending on your use case, you could reuse bios or write your own container to chain bio_vecs. >> The structure for holding dma addresses doesn't really exist >> in a generic form, but would be an array of these structures: >> >> struct dma_sg { >> dma_addr_t addr; >> u32 len; >> }; > Yes, we easily have ranges of >1GB. So I would certainly say u64 for the len here. I'd probably avoid the u64 here and leave space for some flags or something. If you have >1GB to map you can always just have mulitple segments. With 4GB per segment and 256 segments per page, a page of DMA sgs can easily map 1TB of memory in a single call and with chaining or larger allocations you can extend that further, if needed. Logan