On Thu, Jan 16, 2020 at 3:04 AM Ryan Goodfellow <rgoodfel@xxxxxxx> wrote: > > On Wed, Jan 15, 2020 at 09:20:30AM +0100, Magnus Karlsson wrote: > > On Wed, Jan 15, 2020 at 8:40 AM Magnus Karlsson > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > On Wed, Jan 15, 2020 at 2:41 AM Ryan Goodfellow <rgoodfel@xxxxxxx> wrote: > > > > > > > > On Tue, Jan 14, 2020 at 03:52:50PM -0500, Ryan Goodfellow wrote: > > > > > On Tue, Jan 14, 2020 at 10:59:19AM +0100, Magnus Karlsson wrote: > > > > > > > > > > > > Just sent out a patch on the mailing list. Would be great if you could > > > > > > try it out. > > > > > > > > > > Thanks for the quick turnaround. I gave this patch a go, both in the bpf-next > > > > > tree and manually applied to the 5.5.0-rc3 branch I've been working with up to > > > > > this point. It does allow for allocating more memory, however packet > > > > > forwarding no longer works. I did not see any complaints from dmesg, but here > > > > > is an example iperf3 session from a client that worked before. > > > > > > > > > > ry@xd2:~$ iperf3 -c 10.1.0.2 > > > > > Connecting to host 10.1.0.2, port 5201 > > > > > [ 5] local 10.1.0.1 port 53304 connected to 10.1.0.2 port 5201 > > > > > [ ID] Interval Transfer Bitrate Retr Cwnd > > > > > [ 5] 0.00-1.00 sec 5.91 MBytes 49.5 Mbits/sec 2 1.41 KBytes > > > > > [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > > > > > [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > > > > > [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > > > > > [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > > > > > [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > > > > > [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > > > > > [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > > > > > [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > > > > > ^C[ 5] 10.00-139.77 sec 0.00 Bytes 0.00 bits/sec 4 1.41 KBytes > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - > > > > > [ ID] Interval Transfer Bitrate Retr > > > > > [ 5] 0.00-139.77 sec 5.91 MBytes 355 Kbits/sec 9 sender > > > > > [ 5] 0.00-139.77 sec 0.00 Bytes 0.00 bits/sec receiver > > > > > iperf3: interrupt - the client has terminated > > > > > > > > > > I'll continue to investigate and report back with anything that I find. > > > > > > > > Interestingly I found this behavior to exist in the bpf-next tree independent > > > > of the patch being present. > > > > > > Ryan, > > > > > > Could you please do a bisect on it? In the 12 commits after the merge > > > commit below there are number of sensitive rewrites of the ring access > > > functions. Maybe one of them breaks your code. When you say "packet > > > forwarding no longer works", do you mean it works for a second or so, > > > then no packets come through? What HW are you using? > > > > > > commit ce3cec27933c069d2015a81e59b93eb656fe7ee4 > > > Merge: 99cacdc 1d9cb1f > > > Author: Alexei Starovoitov <ast@xxxxxxxxxx> > > > Date: Fri Dec 20 16:00:10 2019 -0800 > > > > > > Merge branch 'xsk-cleanup' > > > > > > Magnus Karlsson says: > > > > > > ==================== > > > This patch set cleans up the ring access functions of AF_XDP in hope > > > that it will now be easier to understand and maintain. I used to get a > > > headache every time I looked at this code in order to really understand it, > > > but now I do think it is a lot less painful. > > > <snip> > > > > > > /Magnus > > > > I see that you have debug messages in your application. Could you > > please run with those on and send me the output so I can see where it > > stops. A bisect that pin-points what commit that breaks your program > > plus the debug output should hopefully send us on the right path for a > > fix. > > > > Thanks: Magnus > > > > Hi Magnus, > > I did a bisect starting from the head of the bpf-next tree (990bca1) down to > the first commit before the patch series you identified (df034c9). The result > was identifying df0ae6f as the commit that causes the issue I am seeing. > > I've posted output from the program in debugging mode here > > - https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930375 Perfect. Thanks. > Yes, you are correct in that forwarding works for a brief period and then stops. > I've noticed that the number of packets that are forwarded is equal to the size > of the producer/consumer descriptor rings. I've posted two ping traces from a > client ping that shows this. > > - https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930376 > - https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930377 > > I've also noticed that when the forwarding stops, the CPU usage for the proc > running the program is pegged, which is not the norm for this program as it uses > a poll call with a timeout on the xsk fd. I will replicate your setup and try to reproduce it. Only have one port connected to my load generator now, but when I get into the office, I will connect two ports. In what loop does the execution get stuck when it hangs at 100% load? /Magnus > The hardware I am using is a Mellanox ConnectX4 2x100G card (MCX416A-CCAT) > running the mlx5 driver. The program is running in zero copy mode. I also tested > this code out in a virtual machine with virtio NICs in SKB mode which uses > xdpgeneric - there were no issues in that setting. > > -- > ~ ry