On 10/7/24 4:15 PM, David Wei wrote: > =========== > Performance > =========== > > Test setup: > * AMD EPYC 9454 > * Broadcom BCM957508 200G > * Kernel v6.11 base [2] > * liburing fork [3] > * kperf fork [4] > * 4K MTU > * Single TCP flow > > With application thread + net rx softirq pinned to _different_ cores: > > epoll > 82.2 Gbps > > io_uring > 116.2 Gbps (+41%) > > Pinned to _same_ core: > > epoll > 62.6 Gbps > > io_uring > 80.9 Gbps (+29%) I'll review the io_uring bits in detail, but I did take a quick look and overall it looks really nice. I decided to give this a spin, as I noticed that Broadcom now has a 230.x firmware release out that supports this. Hence no dependencies on that anymore, outside of some pain getting the fw updated. Here are my test setup details: Receiver: AMD EPYC 9754 (recei Broadcom P2100G -git + this series + the bnxt series referenced Sender: Intel(R) Xeon(R) Platinum 8458P Broadcom P2100G -git Test: kperf with David's patches to support io_uring zc. Eg single flow TCP, just testing bandwidth. A single cpu/thread being used on both the receiver and sender side. non-zc 60.9 Gbps io_uring + zc 97.1 Gbps or +59% faster. There's quite a bit of IRQ side work, I'm guessing I might need to tune it a bit. But it Works For Me, and the results look really nice. I did run into an issue with the bnxt driver defaulting to shared tx/rx queues, and it not working for me in that configuration. Once I disabled that, it worked fine. This may or may not be an issue with the flow rule to direct the traffic, the driver queue start, or something else. Don't know for sure, will need to check with the driver folks. Once sorted, I didn't see any issues with the code in the patchset. -- Jens Axboe