Re: [PATCH v1 00/15] io_uring zero copy rx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/9/24 16:43, Jens Axboe wrote:
On 10/9/24 9:38 AM, David Ahern wrote:
On 10/9/24 9:27 AM, Jens Axboe wrote:
On 10/7/24 4:15 PM, David Wei wrote:
===========
Performance
===========

Test setup:
* AMD EPYC 9454
* Broadcom BCM957508 200G
* Kernel v6.11 base [2]
* liburing fork [3]
* kperf fork [4]
* 4K MTU
* Single TCP flow

With application thread + net rx softirq pinned to _different_ cores:

epoll
82.2 Gbps

io_uring
116.2 Gbps (+41%)

Pinned to _same_ core:

epoll
62.6 Gbps

io_uring
80.9 Gbps (+29%)

I'll review the io_uring bits in detail, but I did take a quick look and
overall it looks really nice.

I decided to give this a spin, as I noticed that Broadcom now has a
230.x firmware release out that supports this. Hence no dependencies on
that anymore, outside of some pain getting the fw updated. Here are my
test setup details:

Receiver:
AMD EPYC 9754 (recei
Broadcom P2100G
-git + this series + the bnxt series referenced

Sender:
Intel(R) Xeon(R) Platinum 8458P
Broadcom P2100G
-git

Test:
kperf with David's patches to support io_uring zc. Eg single flow TCP,
just testing bandwidth. A single cpu/thread being used on both the
receiver and sender side.

non-zc
60.9 Gbps

io_uring + zc
97.1 Gbps

so line rate? Did you look at whether there is cpu to spare? meaning it
will report higher speeds with a 200G setup?

Yep basically line rate, I get 97-98Gbps. I originally used a slower box
as the sender, but then you're capped on the non-zc sender being too
slow. The intel box does better, but it's still basically maxing out the
sender at this point. So yeah, with a faster (or more efficient sender),
I have no doubts this will go much higher per thread, if the link bw was
there. When I looked at CPU usage for the receiver, the thread itself is
using ~30% CPU. And then there's some softirq/irq time outside of that,
but that should ammortize with higher bps rates too I'd expect.

My nic does have 2 100G ports, so might warrant a bit more testing...
If you haven't done it already, I'd also pin softirq processing to
the same CPU as the app so we measure the full stack. kperf has an
option IIRC.

--
Pavel Begunkov




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux