On Thu, Aug 10, 2023 at 3:29 AM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 10.08.23 um 03:57 schrieb Mina Almasry: > > Changes in RFC v2: > > ------------------ > > > > The sticking point in RFC v1[1] was the dma-buf pages approach we used to > > deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept > > that attempts to resolve this by implementing scatterlist support in the > > networking stack, such that we can import the dma-buf scatterlist > > directly. > > Impressive work, I didn't thought that this would be possible that "easily". > > Please note that we have considered replacing scatterlists with simple > arrays of DMA-addresses in the DMA-buf framework to avoid people trying > to access the struct page inside the scatterlist. > FWIW, I'm not doing anything with the struct pages inside the scatterlist. All I need from the scatterlist are the sg_dma_address(sg) and the sg_dma_len(sg), and I'm guessing the array you're describing will provide exactly those, but let me know if I misunderstood. > It might be a good idea to push for that first before this here is > finally implemented. > > GPU drivers already convert the scatterlist used to arrays of > DMA-addresses as soon as they get them. This leaves RDMA and V4L as the > other two main users which would need to be converted. > > > This is the approach proposed at a high level here[2]. > > > > Detailed changes: > > 1. Replaced dma-buf pages approach with importing scatterlist into the > > page pool. > > 2. Replace the dma-buf pages centric API with a netlink API. > > 3. Removed the TX path implementation - there is no issue with > > implementing the TX path with scatterlist approach, but leaving > > out the TX path makes it easier to review. > > 4. Functionality is tested with this proposal, but I have not conducted > > perf testing yet. I'm not sure there are regressions, but I removed > > perf claims from the cover letter until they can be re-confirmed. > > 5. Added Signed-off-by: contributors to the implementation. > > 6. Fixed some bugs with the RX path since RFC v1. > > > > Any feedback welcome, but specifically the biggest pending questions > > needing feedback IMO are: > > > > 1. Feedback on the scatterlist-based approach in general. > > As far as I can see this sounds like the right thing to do in general. > > Question is rather if we should stick with scatterlist, use array of > DMA-addresses or maybe even come up with a completely new structure. > As far as I can tell, it should be trivial to switch this device memory TCP implementation to anything that provides: 1. DMA-addresses (sg_dma_address() equivalent) 2. lengths (sg_dma_len() equivalent) if you go that route. Specifically, I think it will be pretty much a localized change to netdev_bind_dmabuf_to_queue() implemented in this patch: https://lore.kernel.org/netdev/ZNULIDzuVVyfyMq2@xxxxxxxx/T/#m2d344b08f54562cc9155c3f5b018cbfaed96036f > > 2. Netlink API (Patch 1 & 2). > > How does netlink manage the lifetime of objects? > Netlink itself doesn't handle the lifetime of the binding. However, the API I implemented unbinds the dma-buf when the netlink socket is destroyed. I do this so that even if the user process crashes or forgets to unbind, the dma-buf will still be unbound once the netlink socket is closed on the process exit. Details in this patch: https://lore.kernel.org/netdev/ZNULIDzuVVyfyMq2@xxxxxxxx/T/#m2d344b08f54562cc9155c3f5b018cbfaed96036f On Thu, Aug 10, 2023 at 9:07 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Thu, Aug 10, 2023 at 12:29:08PM +0200, Christian König wrote: > > Am 10.08.23 um 03:57 schrieb Mina Almasry: > > > Changes in RFC v2: > > > ------------------ > > > > > > The sticking point in RFC v1[1] was the dma-buf pages approach we used to > > > deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept > > > that attempts to resolve this by implementing scatterlist support in the > > > networking stack, such that we can import the dma-buf scatterlist > > > directly. > > > > Impressive work, I didn't thought that this would be possible that "easily". > > > > Please note that we have considered replacing scatterlists with simple > > arrays of DMA-addresses in the DMA-buf framework to avoid people trying to > > access the struct page inside the scatterlist. > > > > It might be a good idea to push for that first before this here is finally > > implemented. > > > > GPU drivers already convert the scatterlist used to arrays of DMA-addresses > > as soon as they get them. This leaves RDMA and V4L as the other two main > > users which would need to be converted. > > Oh that would be a nightmare for RDMA. > > We need a standard based way to have scalable lists of DMA addresses :( > > > > 2. Netlink API (Patch 1 & 2). > > > > How does netlink manage the lifetime of objects? > > And access control.. > Someone will correct me if I'm wrong but I'm not sure netlink itself will do (sufficient) access control. However I meant for the netlink API to bind dma-bufs to be a CAP_NET_ADMIN API, and I forgot to add this check in this proof-of-concept, sorry. I'll add a CAP_NET_ADMIN check in netdev_bind_dmabuf_to_queue() in the next iteration.