Re: zero-copy between interfaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 15, 2020 at 09:20:30AM +0100, Magnus Karlsson wrote:
> On Wed, Jan 15, 2020 at 8:40 AM Magnus Karlsson
> <magnus.karlsson@xxxxxxxxx> wrote:
> >
> > On Wed, Jan 15, 2020 at 2:41 AM Ryan Goodfellow <rgoodfel@xxxxxxx> wrote:
> > >
> > > On Tue, Jan 14, 2020 at 03:52:50PM -0500, Ryan Goodfellow wrote:
> > > > On Tue, Jan 14, 2020 at 10:59:19AM +0100, Magnus Karlsson wrote:
> > > > >
> > > > > Just sent out a patch on the mailing list. Would be great if you could
> > > > > try it out.
> > > >
> > > > Thanks for the quick turnaround. I gave this patch a go, both in the bpf-next
> > > > tree and manually applied to the 5.5.0-rc3 branch I've been working with up to
> > > > this point. It does allow for allocating more memory, however packet
> > > > forwarding no longer works. I did not see any complaints from dmesg, but here
> > > > is an example iperf3 session from a client that worked before.
> > > >
> > > > ry@xd2:~$ iperf3 -c 10.1.0.2
> > > > Connecting to host 10.1.0.2, port 5201
> > > > [  5] local 10.1.0.1 port 53304 connected to 10.1.0.2 port 5201
> > > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > > [  5]   0.00-1.00   sec  5.91 MBytes  49.5 Mbits/sec    2   1.41 KBytes
> > > > [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > [  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > [  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > [  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > [  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > ^C[  5]  10.00-139.77 sec  0.00 Bytes  0.00 bits/sec    4   1.41 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval           Transfer     Bitrate         Retr
> > > > [  5]   0.00-139.77 sec  5.91 MBytes   355 Kbits/sec    9             sender
> > > > [  5]   0.00-139.77 sec  0.00 Bytes  0.00 bits/sec                  receiver
> > > > iperf3: interrupt - the client has terminated
> > > >
> > > > I'll continue to investigate and report back with anything that I find.
> > >
> > > Interestingly I found this behavior to exist in the bpf-next tree independent
> > > of the patch being present.
> >
> > Ryan,
> >
> > Could you please do a bisect on it? In the 12 commits after the merge
> > commit below there are number of sensitive rewrites of the ring access
> > functions. Maybe one of them breaks your code. When you say "packet
> > forwarding no longer works", do you mean it works for a second or so,
> > then no packets come through? What HW are you using?
> >
> > commit ce3cec27933c069d2015a81e59b93eb656fe7ee4
> > Merge: 99cacdc 1d9cb1f
> > Author: Alexei Starovoitov <ast@xxxxxxxxxx>
> > Date:   Fri Dec 20 16:00:10 2019 -0800
> >
> >     Merge branch 'xsk-cleanup'
> >
> >     Magnus Karlsson says:
> >
> >     ====================
> >     This patch set cleans up the ring access functions of AF_XDP in hope
> >     that it will now be easier to understand and maintain. I used to get a
> >     headache every time I looked at this code in order to really understand it,
> >     but now I do think it is a lot less painful.
> >     <snip>
> >
> > /Magnus
> 
> I see that you have debug messages in your application. Could you
> please run with those on and send me the output so I can see where it
> stops. A bisect that pin-points what commit that breaks your program
> plus the debug output should hopefully send us on the right path for a
> fix.
> 
> Thanks: Magnus
> 

Hi Magnus,

I did a bisect starting from the head of the bpf-next tree (990bca1) down to 
the first commit before the patch series you identified (df034c9). The result
was identifying df0ae6f as the commit that causes the issue I am seeing.

I've posted output from the program in debugging mode here

- https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930375

Yes, you are correct in that forwarding works for a brief period and then stops.
I've noticed that the number of packets that are forwarded is equal to the size
of the producer/consumer descriptor rings. I've posted two ping traces from a
client ping that shows this.

- https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930376
- https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930377

I've also noticed that when the forwarding stops, the CPU usage for the proc
running the program is pegged, which is not the norm for this program as it uses
a poll call with a timeout on the xsk fd.

The hardware I am using is a Mellanox ConnectX4 2x100G card (MCX416A-CCAT)
running the mlx5 driver. The program is running in zero copy mode. I also tested
this code out in a virtual machine with virtio NICs in SKB mode which uses
xdpgeneric - there were no issues in that setting.

-- 
~ ry



[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux