Re: Question about xdp: how to figure out the throughput is limited by pcie

Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> · Thu, 13 Apr 2023 13:30:23 +0200

On 13/04/2023 13.16, Toke Høiland-Jørgensen wrote:

Qiongwen Xu <qx51@xxxxxxxxxxxxxx> writes:

Hi Jesper,

Thanks for the detailed reply and sharing these helpful
materials/papers with us!

(Please don't top post on the mailing list).

+1

After enabling rx_cqe_compress, the throughput in our experiment increases from
70+Mpps to 85 Mpps. We also tried to use the counter "rx_discards_phy". The counter
increases in both cpu-limited and pcie-limited experiments, i.e., in the experiment
which is only cpu-limited can also increase the counter. We are looking for any
counter that can separate cpu- and pcie-limited cases. Regarding the [pcie-bench] tool,
unfortunately, we are not able to use it, as it requires fpga hardware.

Well, are your CPUs being maxed out? IIRC it was pretty obvious that
they weren't when we were running those tests, so just looking at

something like 'mpstat' should give you a hint. 

As you can see in[1] I find this mpstat command very useful:

 $ mpstat -P ALL -u -I SCPU -I SUM 2

The tool turbostat will also tell you how busy individial CPUs are.

For more detailed analysis you can use 'perf' to see exactly where
the CPU is spending its time.

Again a practical hint.
Perf record with cmdline:

 # perf record -g -a -- sleep 10

Look at results with cmdline that also expose the 'cpu' info:

 # perf report --sort cpu,dso,symbol --no-children

Look at a specific CPU e.g. core 3 (counting from 0) with cmdline:

 # perf report --sort cpu,dso,symbol --no-children -C3

--Jesper

Links:

 [1] 

https://github.com/xdp-project/xdp-paper/blob/master/benchmarks/bench02_xdp_drop.org#test-100g-bandwidth