Le 30/05/2019 à 09:40, Jesper Dangaard Brouer a écrit :
server$ sudo ethtool -X dpdk1 equal 2Interesting use of ethtool -X (Set Rx flow hash indirection table), I could use that myself in some of my tests. I usually change the number of RX-queue via ethtool -L (or --set-channels), which the i40e/XL710 have issues with...
Yes and it doesn't kill the existing queues for multiple seconds, and keeps the affinity of the IRQs.
What SMP affinity script are you using? The mellanox drivers uses another "layout"/name-scheme in /proc/irq/*/*name*/../smp_affinity_list. Normal Intel based nics I use this: echo " --- Align IRQs ---" # I've named my NICs ixgbe1 + ixgbe2 for F in /proc/irq/*/ixgbe*-TxRx-*/../smp_affinity_list; do # Extract irqname e.g. "ixgbe2-TxRx-2" irqname=$(basename $(dirname $(dirname $F))) ; # Substring pattern removal hwq_nr=${irqname#*-*-} echo $hwq_nr > $F #grep . -H $F; done grep -H . /proc/irq/*/ixgbe*/../smp_affinity_list But for Mellanox I had to use this: echo " --- Align IRQs : mlx5 ---" for F in /proc/irq/*/mlx5_comp*/../smp_affinity; do dir=$(dirname $F) ; cat $dir/affinity_hint > $F done grep -H . /proc/irq/*/mlx5_comp*/../smp_affinity_list
Correct, I used the Mellanox script installed by the mellanox OFED for all. I think that one works for all. Anyway this could explain why the netperf case did go to the same core with I40E, but with my iperf one, the 50 flows where distributed all the way around.
I made a video (enable subtitles). I just re-compiled with clean 5.1.5 (no RSS modification or anything), it's the same thing that you can see on the video. The iperf is the normal iperf2 also in the video. Enabling the xdp_pass program create a huge CPU increase with CX5. With XL710 I get only a 1 or 2 % per CPU increase.
https://www.youtube.com/watch?v=o5hlJZbN4Tk&feature=youtu.be
I do have one patch to copy the RSS hash in the xdp_buff, but the field is read even if xdp is disabled.What is you use-case for this?
Load balancing. No need to re-compute a hash in SW if HW did it...
I would be happy to help. But we would be making a sk_buff again if we start throwing things in the buff that people may be using in one use case. Already, first time I looked at XDP it was "data + len" and now there is like 10 fields extracted with the queue info.Upstream will likely request that this is added as xdp_buff->metadata and using BTF format... but it is a longer project see[1], and is currently scheduled as a "medium-term" task... let us know if you want to work on this... [1] https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#metadata-available-to-programs
On the contrary, I found the BPF resolver (name? the thing that translate the BPF offset to the real struct offset) to be super neat. Wouldn't driver be able to expose a specific per-driver resolver that will know how to fetch all this random information in the descriptors and will therefore do it only if needed?
Tom