On Fri, Jul 14, 2023 at 8:55 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Fri, Jul 14, 2023 at 07:55:15AM -0700, Mina Almasry wrote: > > > Once the skb frags with struct new_abstraction are in the TCP stack, > > they will need some special handling in code accessing the frags. But > > my RFC already addressed that somewhat because the frags were > > inaccessible in that case. In this case the frags will be both > > inaccessible and will not be struct pages at all (things like > > get_page() will not work), so more special handling will be required, > > maybe. > > It seems sort of reasonable, though there will be interesting concerns > about coherence and synchronization with generial purpose DMABUFs that > will need tackling. > > Still it is such a lot of churn and weridness in the netdev side, I > think you'd do well to present an actual full application as > justification. > > Yes, you showed you can stick unordered TCP data frags into GPU memory > sort of quickly, but have you gone further with this to actually show > it is useful for a real world GPU centric application? > > BTW your cover letter said 96% utilization, the usual server > configuation is one NIC per GPU, so you were able to hit 1500Gb/sec of > TCP BW with this? > I do notice that the number of NICs is missing from our public documentation so far, so I will refrain from specifying how many NICs are on those A3 VMs until the information is public. But I think I can confirm that your general thinking is correct, the perf that we're getting is 96.6% line rate of each GPU/NIC pair, and scales linearly for each NIC/GPU pair we've tested with so far. Line rate of each NIC/GPU pair is 200 Gb/sec. So if we have 8 NIC/GPU pairs we'd be hitting 96.6% * 200 * 8 = 1545 GB/sec. If we have, say, 2 NIC/GPU pairs, we'd be hitting 96.6% * 200 * 2 = 384 GB/sec ... etc. -- Thanks, Mina