On 10/01/2025 20:58, Martin Karsten wrote: > On 2025-01-10 13:26, Stanislav Fomichev wrote: >> On 01/10, Joe Damato wrote: >>> On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote: >>>> On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote: >>>>> >>> >>> [...] >>> >>>>> >>>>> Hi Joe, >>>>> >>>>> Thanks for the quick response. >>>>> Comments inline, If you need more details or further clarification, >>>>> please let me know. >>>> >>>> As mentioned above and in my previous emails: please provide lot >>>> more detail and make it as easy as possible for me to reproduce this >>>> issue with the simplest reproducer possible and a much more detailed >>>> explanation. >>>> >>>> Please note: I will be out of the office until Jan 9 so my responses >>>> will be limited until then. >>> >>> Just to follow up on this for anyone who missed the other thread, >>> Stanislav proposed a patch which _might_ fix the issue being hit >>> here. >>> >>> Please see [1], try that patch, and report back if that patch fixes >>> the issue. >>> >>> Thanks. >>> >>> [1]: https://lore.kernel.org/netdev/20250109003436.2829560-1- >>> sdf@xxxxxxxxxxx/ >> >> Note that it might help only if xsk is using busy-polling. Not sure >> that's the case, it's relatively obscure feature :-) > > I believe I have reproduced Alex' issue using the methodology below and > your patch fixes it for me. > > The experiment uses a server (tilly01) with mlx5 and a client (tilly02). > In the problem case, the 'response' packet gets stuck, but the next > 'request' packets triggers both the stuck and the regular responses. The > pattern can also be seen in the tcpdump output at the client. Note that > the response packet is not a valid packet (only MAC addresses swapped, > not IP addresses), but tcpdump shows it regardless. > > Thanks, > Martin > > # on server tilly01 > watch -n 0.5 "sudo ethtool -S ens2f1np1 | fgrep tx_xsk_xmit" > > # on client tilly02 > sudo tcpdump -qbi eno3d1 udp > > # on client tilly02 > while true; do > ssh tilly01 "sudo ifconfig ens2f1np1 down; sudo modprobe -r mlx5_ib; > sleep 1; sudo modprobe mlx5_ib; sudo ifconfig ens2f1np1 up" > ssh -f tilly01 "sudo ./bpf-examples/AF_XDP-example/xdpsock \ > -i ens2f1np1 -N -q 4 --l2fwd -z -B >/dev/null 2>&1" > exp=1 > for ((i=0;i<5;i++)); do > ssh tilly01 "sudo ethtool --config-ntuple ens2f1np1 flow-type udp4\ > dst-port 19017 action 4 >/dev/null 2>&1" > for ((j=0;j<10;j++)); do > echo -n "$exp " > echo 'send(IP(dst="192.168.199.1",src="192.168.199.2")\ > /UDP(dport=19017))' | sudo ./scapy/run_scapy >/dev/null 2>&1 > cnt=$(ssh tilly01 ethtool -S ens2f1np1|grep -F tx_xsk_xmit\ > |cut -f2 -d:) > [ $cnt -eq $exp ] || { > echo COUNTER WRONG > read x > } > ((exp+=1)) > done > ssh tilly01 sudo ethtool --config-ntuple ens2f1np1 delete 1023 > done > echo reset > ssh tilly01 sudo killall xdpsock > done > Thanks to Joe Martin and Stanislav for introducing this fix and for your efforts in solving this issue. I reviewed it over the weekend and verified that it solves the problem. Thanks, Alex Lazar