Re: zero-copy between interfaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 16, 2020 at 3:04 AM Ryan Goodfellow <rgoodfel@xxxxxxx> wrote:
>
> On Wed, Jan 15, 2020 at 09:20:30AM +0100, Magnus Karlsson wrote:
> > On Wed, Jan 15, 2020 at 8:40 AM Magnus Karlsson
> > <magnus.karlsson@xxxxxxxxx> wrote:
> > >
> > > On Wed, Jan 15, 2020 at 2:41 AM Ryan Goodfellow <rgoodfel@xxxxxxx> wrote:
> > > >
> > > > On Tue, Jan 14, 2020 at 03:52:50PM -0500, Ryan Goodfellow wrote:
> > > > > On Tue, Jan 14, 2020 at 10:59:19AM +0100, Magnus Karlsson wrote:
> > > > > >
> > > > > > Just sent out a patch on the mailing list. Would be great if you could
> > > > > > try it out.
> > > > >
> > > > > Thanks for the quick turnaround. I gave this patch a go, both in the bpf-next
> > > > > tree and manually applied to the 5.5.0-rc3 branch I've been working with up to
> > > > > this point. It does allow for allocating more memory, however packet
> > > > > forwarding no longer works. I did not see any complaints from dmesg, but here
> > > > > is an example iperf3 session from a client that worked before.
> > > > >
> > > > > ry@xd2:~$ iperf3 -c 10.1.0.2
> > > > > Connecting to host 10.1.0.2, port 5201
> > > > > [  5] local 10.1.0.1 port 53304 connected to 10.1.0.2 port 5201
> > > > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > > > [  5]   0.00-1.00   sec  5.91 MBytes  49.5 Mbits/sec    2   1.41 KBytes
> > > > > [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > > [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > > [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > > [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > > [  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > > [  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > > [  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > > [  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > > ^C[  5]  10.00-139.77 sec  0.00 Bytes  0.00 bits/sec    4   1.41 KBytes
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > [ ID] Interval           Transfer     Bitrate         Retr
> > > > > [  5]   0.00-139.77 sec  5.91 MBytes   355 Kbits/sec    9             sender
> > > > > [  5]   0.00-139.77 sec  0.00 Bytes  0.00 bits/sec                  receiver
> > > > > iperf3: interrupt - the client has terminated
> > > > >
> > > > > I'll continue to investigate and report back with anything that I find.
> > > >
> > > > Interestingly I found this behavior to exist in the bpf-next tree independent
> > > > of the patch being present.
> > >
> > > Ryan,
> > >
> > > Could you please do a bisect on it? In the 12 commits after the merge
> > > commit below there are number of sensitive rewrites of the ring access
> > > functions. Maybe one of them breaks your code. When you say "packet
> > > forwarding no longer works", do you mean it works for a second or so,
> > > then no packets come through? What HW are you using?
> > >
> > > commit ce3cec27933c069d2015a81e59b93eb656fe7ee4
> > > Merge: 99cacdc 1d9cb1f
> > > Author: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > Date:   Fri Dec 20 16:00:10 2019 -0800
> > >
> > >     Merge branch 'xsk-cleanup'
> > >
> > >     Magnus Karlsson says:
> > >
> > >     ====================
> > >     This patch set cleans up the ring access functions of AF_XDP in hope
> > >     that it will now be easier to understand and maintain. I used to get a
> > >     headache every time I looked at this code in order to really understand it,
> > >     but now I do think it is a lot less painful.
> > >     <snip>
> > >
> > > /Magnus
> >
> > I see that you have debug messages in your application. Could you
> > please run with those on and send me the output so I can see where it
> > stops. A bisect that pin-points what commit that breaks your program
> > plus the debug output should hopefully send us on the right path for a
> > fix.
> >
> > Thanks: Magnus
> >
>
> Hi Magnus,
>
> I did a bisect starting from the head of the bpf-next tree (990bca1) down to
> the first commit before the patch series you identified (df034c9). The result
> was identifying df0ae6f as the commit that causes the issue I am seeing.
>
> I've posted output from the program in debugging mode here
>
> - https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930375

Perfect. Thanks.

> Yes, you are correct in that forwarding works for a brief period and then stops.
> I've noticed that the number of packets that are forwarded is equal to the size
> of the producer/consumer descriptor rings. I've posted two ping traces from a
> client ping that shows this.
>
> - https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930376
> - https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930377
>
> I've also noticed that when the forwarding stops, the CPU usage for the proc
> running the program is pegged, which is not the norm for this program as it uses
> a poll call with a timeout on the xsk fd.

I will replicate your setup and try to reproduce it. Only have one
port connected to my load generator now, but when I get into the
office, I will connect two ports.

In what loop does the execution get stuck when it hangs at 100% load?

/Magnus

> The hardware I am using is a Mellanox ConnectX4 2x100G card (MCX416A-CCAT)
> running the mlx5 driver. The program is running in zero copy mode. I also tested
> this code out in a virtual machine with virtio NICs in SKB mode which uses
> xdpgeneric - there were no issues in that setting.
>
> --
> ~ ry



[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux