On Thu, Aug 31, 2023 at 02:35:32PM +0200, Christoph Hellwig wrote: > On Wed, Aug 30, 2023 at 01:43:57PM -0300, Jason Gunthorpe wrote: > > > > conversion function for the drivers. > > > > > > Jason said at LSF/MM that he had a prototype for a mapping API that > > > takes a phys/len array as input and dma_addr/len a output, which really > > > is the right thing to do, especially for dmabuf. > > > > Yes, still a prototype. Given the change in direction some of the > > assumptions of the list design will need some adjusting. > > > > I felt there wasn't much justification to add a list without also > > supporting the P2P and it was not looking very good to give the DMA > > API proper p2p support without also connecting it to lists somehow. > > > > Anyhow, I had drafted a basic list datastructure and starting > > implementation that is sort of structured in away that is similar to > > xarray (eg with fixed chunks, generic purpose, etc) > > > > https://github.com/jgunthorpe/linux/commit/58d7e0578a09d9cd2360be515208bcd74ade5958 > > This seems fairly complicated complicated, and the entry seems pretty large > for a bio_vec replacement or a dma_addr_t+len tuple, which both should > be (sizeof(phys_addr_t) + sizeof(u32) + the size of flags if needed, which > for 64-bit would fit into the padding from 96 bytes to 128 bytes anyway. The entry is variable sized, so it depends on what is stuffed in it. For alot of common use cases, especially RDMA page lists, it will be able to use an 8 byte entry. This is pretty much the most space efficient it could be. There are RDMA use cases where we end up holding huge numbers of pages for a long time just so we can eventually unpin them. It is a nice outcome if that could use 8 bytes/folio. The primary alternative I see is a fixed 16 bytes/entry with a 64 bit address and ~60 bit length + ~4 bits of flags. This is closer to bio, simpler and faster, but makes the RDMA cases 2x bigger. Which are the right trade offs, or not, I don't know yet. I wanted to experiment with what this would look like for a bit. With your direction I felt we could safely keep bio as it is and cheaply make a fast DMA mapper for it. Provide something like this as the 'kitchen sink' version for dmabuf/rdma/etc that are a little different. Jason