Am 10.08.23 um 20:44 schrieb Mina Almasry:
On Thu, Aug 10, 2023 at 3:29 AM Christian König
<christian.koenig@xxxxxxx> wrote:
Am 10.08.23 um 03:57 schrieb Mina Almasry:
Changes in RFC v2:
------------------
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly.
Impressive work, I didn't thought that this would be possible that "easily".
Please note that we have considered replacing scatterlists with simple
arrays of DMA-addresses in the DMA-buf framework to avoid people trying
to access the struct page inside the scatterlist.
FWIW, I'm not doing anything with the struct pages inside the
scatterlist. All I need from the scatterlist are the
sg_dma_address(sg) and the sg_dma_len(sg), and I'm guessing the array
you're describing will provide exactly those, but let me know if I
misunderstood.
Your understanding is perfectly correct.
It might be a good idea to push for that first before this here is
finally implemented.
GPU drivers already convert the scatterlist used to arrays of
DMA-addresses as soon as they get them. This leaves RDMA and V4L as the
other two main users which would need to be converted.
This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
As far as I can see this sounds like the right thing to do in general.
Question is rather if we should stick with scatterlist, use array of
DMA-addresses or maybe even come up with a completely new structure.
As far as I can tell, it should be trivial to switch this device
memory TCP implementation to anything that provides:
1. DMA-addresses (sg_dma_address() equivalent)
2. lengths (sg_dma_len() equivalent)
if you go that route. Specifically, I think it will be pretty much a
localized change to netdev_bind_dmabuf_to_queue() implemented in this
patch:
https://lore.kernel.org/netdev/ZNULIDzuVVyfyMq2@xxxxxxxx/T/#m2d344b08f54562cc9155c3f5b018cbfaed96036f
Thanks, that's exactly what I wanted to hear.
2. Netlink API (Patch 1 & 2).
How does netlink manage the lifetime of objects?
Netlink itself doesn't handle the lifetime of the binding. However,
the API I implemented unbinds the dma-buf when the netlink socket is
destroyed. I do this so that even if the user process crashes or
forgets to unbind, the dma-buf will still be unbound once the netlink
socket is closed on the process exit. Details in this patch:
https://lore.kernel.org/netdev/ZNULIDzuVVyfyMq2@xxxxxxxx/T/#m2d344b08f54562cc9155c3f5b018cbfaed96036f
I need to double check the details, but at least of hand that sounds
sufficient for the lifetime requirements of DMA-buf.
Thanks,
Christian.
On Thu, Aug 10, 2023 at 9:07 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
On Thu, Aug 10, 2023 at 12:29:08PM +0200, Christian König wrote:
Am 10.08.23 um 03:57 schrieb Mina Almasry:
Changes in RFC v2:
------------------
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly.
Impressive work, I didn't thought that this would be possible that "easily".
Please note that we have considered replacing scatterlists with simple
arrays of DMA-addresses in the DMA-buf framework to avoid people trying to
access the struct page inside the scatterlist.
It might be a good idea to push for that first before this here is finally
implemented.
GPU drivers already convert the scatterlist used to arrays of DMA-addresses
as soon as they get them. This leaves RDMA and V4L as the other two main
users which would need to be converted.
Oh that would be a nightmare for RDMA.
We need a standard based way to have scalable lists of DMA addresses :(
2. Netlink API (Patch 1 & 2).
How does netlink manage the lifetime of objects?
And access control..
Someone will correct me if I'm wrong but I'm not sure netlink itself
will do (sufficient) access control. However I meant for the netlink
API to bind dma-bufs to be a CAP_NET_ADMIN API, and I forgot to add
this check in this proof-of-concept, sorry. I'll add a CAP_NET_ADMIN
check in netdev_bind_dmabuf_to_queue() in the next iteration.