Re: zero-copy between interfaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-01-21 09:34, Magnus Karlsson wrote:
> On Thu, Jan 16, 2020 at 3:04 AM Ryan Goodfellow <rgoodfel@xxxxxxx> wrote:
>>
>> On Wed, Jan 15, 2020 at 09:20:30AM +0100, Magnus Karlsson wrote:
>>> On Wed, Jan 15, 2020 at 8:40 AM Magnus Karlsson
>>> <magnus.karlsson@xxxxxxxxx> wrote:
>>>>
>>>> On Wed, Jan 15, 2020 at 2:41 AM Ryan Goodfellow <rgoodfel@xxxxxxx> wrote:
>>>>>
>>>>> On Tue, Jan 14, 2020 at 03:52:50PM -0500, Ryan Goodfellow wrote:
>>>>>> On Tue, Jan 14, 2020 at 10:59:19AM +0100, Magnus Karlsson wrote:
>>>>>>>
>>>>>>> Just sent out a patch on the mailing list. Would be great if you could
>>>>>>> try it out.
>>>>>>
>>>>>> Thanks for the quick turnaround. I gave this patch a go, both in the bpf-next
>>>>>> tree and manually applied to the 5.5.0-rc3 branch I've been working with up to
>>>>>> this point. It does allow for allocating more memory, however packet
>>>>>> forwarding no longer works. I did not see any complaints from dmesg, but here
>>>>>> is an example iperf3 session from a client that worked before.
>>>>>>
>>>>>> ry@xd2:~$ iperf3 -c 10.1.0.2
>>>>>> Connecting to host 10.1.0.2, port 5201
>>>>>> [  5] local 10.1.0.1 port 53304 connected to 10.1.0.2 port 5201
>>>>>> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
>>>>>> [  5]   0.00-1.00   sec  5.91 MBytes  49.5 Mbits/sec    2   1.41 KBytes
>>>>>> [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
>>>>>> [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
>>>>>> [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
>>>>>> [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
>>>>>> [  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
>>>>>> [  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
>>>>>> [  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
>>>>>> [  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
>>>>>> ^C[  5]  10.00-139.77 sec  0.00 Bytes  0.00 bits/sec    4   1.41 KBytes
>>>>>> - - - - - - - - - - - - - - - - - - - - - - - - -
>>>>>> [ ID] Interval           Transfer     Bitrate         Retr
>>>>>> [  5]   0.00-139.77 sec  5.91 MBytes   355 Kbits/sec    9             sender
>>>>>> [  5]   0.00-139.77 sec  0.00 Bytes  0.00 bits/sec                  receiver
>>>>>> iperf3: interrupt - the client has terminated
>>>>>>
>>>>>> I'll continue to investigate and report back with anything that I find.
>>>>>
>>>>> Interestingly I found this behavior to exist in the bpf-next tree independent
>>>>> of the patch being present.
>>>>
>>>> Ryan,
>>>>
>>>> Could you please do a bisect on it? In the 12 commits after the merge
>>>> commit below there are number of sensitive rewrites of the ring access
>>>> functions. Maybe one of them breaks your code. When you say "packet
>>>> forwarding no longer works", do you mean it works for a second or so,
>>>> then no packets come through? What HW are you using?
>>>>
>>>> commit ce3cec27933c069d2015a81e59b93eb656fe7ee4
>>>> Merge: 99cacdc 1d9cb1f
>>>> Author: Alexei Starovoitov <ast@xxxxxxxxxx>
>>>> Date:   Fri Dec 20 16:00:10 2019 -0800
>>>>
>>>>      Merge branch 'xsk-cleanup'
>>>>
>>>>      Magnus Karlsson says:
>>>>
>>>>      ====================
>>>>      This patch set cleans up the ring access functions of AF_XDP in hope
>>>>      that it will now be easier to understand and maintain. I used to get a
>>>>      headache every time I looked at this code in order to really understand it,
>>>>      but now I do think it is a lot less painful.
>>>>      <snip>
>>>>
>>>> /Magnus
>>>
>>> I see that you have debug messages in your application. Could you
>>> please run with those on and send me the output so I can see where it
>>> stops. A bisect that pin-points what commit that breaks your program
>>> plus the debug output should hopefully send us on the right path for a
>>> fix.
>>>
>>> Thanks: Magnus
>>>

Hi Ryan,

>> Hi Magnus,
>>
>> I did a bisect starting from the head of the bpf-next tree (990bca1) down to
>> the first commit before the patch series you identified (df034c9). The result
>> was identifying df0ae6f as the commit that causes the issue I am seeing.

This commit and the commit before it remove batching in xskq_nb_avail. 
Before these two commits xskq_nb_avail may have returned values 0..16, 
and now it's not limited. It looks like a change too minor to cause this 
issue...

>> I've posted output from the program in debugging mode here
>>
>> - https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930375
>>
>> Yes, you are correct in that forwarding works for a brief period and then stops.
>> I've noticed that the number of packets that are forwarded is equal to the size
>> of the producer/consumer descriptor rings. I've posted two ping traces from a
>> client ping that shows this.
>>
>> - https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930376
>> - https://gitlab.com/mergetb/tech/network-emulation/kernel/snippets/1930377

These snippets are not available.

>>
>> I've also noticed that when the forwarding stops, the CPU usage for the proc
>> running the program is pegged, which is not the norm for this program as it uses
>> a poll call with a timeout on the xsk fd.

This information led me to a guess what may be happening. On the RX 
side, mlx5e allocates pages in bulks for performance reasons and to 
leverage hardware features targeted to performance. In AF_XDP mode, 
bulking of frames is also used (on x86, the bulk size is 64 with 
striding RQ enabled, and 8 otherwise, however, it's implementation 
details that might change later). If you don't put enough frames to XSK 
Fill Ring, the driver will be demanding more frames and return from 
poll() immediately. Basically, in the application, you should put as 
many frames to the Fill Ring as you can. Please check if that could be 
the root cause of your issue.

I tracked this issue in our internal bug tracker in case we need to 
perform actual debugging of mlx5e. I'm looking forward to your feedback 
on my assumption above.

>> The hardware I am using is a Mellanox ConnectX4 2x100G card (MCX416A-CCAT)
>> running the mlx5 driver.

This one should run without striding RQ, please verify it with ethtool 
--show-priv-flags (the flag name is rx_striding_rq).

> The program is running in zero copy mode. I also tested
>> this code out in a virtual machine with virtio NICs in SKB mode which uses
>> xdpgeneric - there were no issues in that setting.
>>
>> --
>> ~ ry
> 
> Maxim,
> 
> Do you think you could help me debug this issue that Ryan is having? I
> can unfortunately not reproduce the stalling issue with my Intel i40e
> cards.
> 
> Ryan, Maxim is Mellanox's responsible for AF_XDP support and he has
> also contributed to the core AF_XDP code. So you are in good hands
> :-).
> 
> Thanks: Magnus
> 





[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux