Re: Is It Possible to RX/Process/TX packets concurrently with AF_XDP?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 29, 2022 at 2:07 AM Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> wrote:
>
>
>
> >> On 12/22/22, 1:38 AM, "Magnus Karlsson" <magnus.karlsson@xxxxxxxxx> wrote:
> >>
> >>   On Thu, Dec 22, 2022 at 12:11 AM Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> wrote:
> >>   >
> >>    >
> >>    > >    On Tue, Dec 13, 2022 at 8:11 PM Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> wrote:
> >>    > >    >
> >>    > >    > It looks like that I didn’t include the mailing list in my previous replies. I hope this one does.
> >>    > >    >
> >>    > >   > Also, for the AF_XDP-forwarding example, is it able to handle multiple AF_XDP sockets on the same NIC?
> >>    > >
> >>    > >    Yes.
> >>    > >
> >>    > >    > Such as:
> >>    > >    >
> >>    > >    > ```
> >>    > >    > ./xdp_fwd -i IFA -q Q1 -i IFA -q Q2 -i IFA -q Q3 -i IFA -q Q4 -c CX -c CY
> >>    > >    >
> >>    > >    > ```
> >>    > >    >
> >>    > >    > If the above is doable, maybe I can have multiple queues, rather than having one, on the same NIC, create one AF_XDP socket per queue, and then use this xdp_fwd example to achieve multi-threading?
> >>    > >
> >>    > >   That is the best way to do multithreading without having to resort to
> >>    > >    expensive locking. One queue and socket per thread is the way to go.
> >>    > >
> >>    >
> >>    > Thank you so much for your reply and suggestions. I made changes following your suggestions and it partly worked!
> >>    >
> >>    > At the beginning, I copied the round robin logic of the xdpsock_kern.c and put it in my XDP code, in the user space, I have the number_of_receving_queues of threads, which equals to the number of cores of my machine(24), each have an AF_XDP socket for that queue, but none of them are receiving packets. I added some logs to the rx_burst and found out that they were all busy polling and the n_packets are always 0. I later changed the number of queues and threads to 4, and the results are the same. What could be the reason that the round robin doesn't work?
> >>    >
> >>    > Then, I set the number of receiving queues to. 8, removed the round robin logic in the XDP code, simply forward the packet to the ctx->rx_ueue_index of the xsks_map; and used 8 threads in the userspace for the 8 queues, and it is now able to receive packets. However, when I tried to increase the number of queue and threads to 16, none of the AF_XDP socket can receive packets again. Do you know what might be cause this? Is it because the number of queues and threads are too many?
> >>
> >>    There is not upper limit to the amount of queues and threads supported
> >>    in the AF_XDP code. Your NIC will likely have a limit on the amount of
> >>    queues though.
> >>
>
> Thank you very much for your tips. After some debugging, I found out that it is because of some bugs initiating the bcache, which results in ports' pcache trading new slabs from the bpool.

Please submit this fix to the bpf-examples repo so other people can
benefit from it too.

> I fixed that and continued testing, my test is to run some iperfs to generate traffic that goes through these AF_XDP sockets.
>
> There is one interesting behavior: the iperf traffic runs fine at the beginning. However, after some time, maybe 30 seconds or 1 minute, the traffic stopped and I didn't see any sockets rx/tx packets(I added some logs there for debugging).
>
> The program didn't crash, it is as if there's no new traffic coming to the sockets. However, if I stop the iperfs and give the sockets a break, say 1 or 2 minutes, and restart the iperfs, the sockets are usually able to rx/tx traffic again, until it doesn't.
>
> I wonder have you seen anything like this before? It feels like that I'm very close to having a fully running program, but I'm stuck at this step and I can't figure out why.

Have not seen anything like this. TCP is sensitive to packet loss. Are
packets dropped in your test run?

> Thanks again for your help.
>
> >>    > Another questions is, since I'm not using round robin in my XDP code, the traffic isn't distributed evenly among my queues, it seems to me that 2 queues are always getting most of the traffic and the others are getting very little:
> >>    >
> >>    > +------+--------------+---------------+--------------+---------------+
> >>    > | Port |   RX packets | RX rate (pps) |   TX packets | TX_rate (pps) |
> >>    > +------+--------------+---------------+--------------+---------------+
> >>    > |    0 |         2113 |             1 |         2113 |             1 |
> >>    > |    1 |            0 |             0 |            0 |             0 |
> >>    > |    2 |            0 |             0 |            0 |             0 |
> >>    > |    3 |          568 |             0 |          568 |             0 |
> >>    > |    4 |         2590 |             1 |         2590 |             1 |
> >>    > |    5 |            0 |             0 |            0 |             0 |
> >>    > |    6 |            0 |             0 |            0 |             0 |
> >>    > |    7 |           85 |             0 |           85 |             0 |
> >>    > +------+--------------+---------------+--------------+---------------+
> >>    >
> >>    > I understand this is mainly because I don't have round robin in my XDP code, but I wonder what decides which queue gets the traffic? Also, if round robin works, does it mean that when a packet arrives in the XDP in queue x, and then be forwarded to an AF_XDP socket with queue y, the packet will be copied, and zero-copy won't work in this case?
> >>
> >>    Your packet distribution among queues is decided by your NIC and the
> >>    traffic it receives. It probably has RSS enabled by default. You can
> >>    program the NIC flow steering rules using ethtool. If you want
> >>    something perfectly spread among the cores, you probably want to have
> >>    a synthetic workload and enable explicit flow steering rules to
> >>    achieve perfect control. Google some examples and experiment without
> >>    using XDP, is my tip.
> >>
> >>    You cannot direct packets coming in on queue X to a socket bound to
> >>    queue Y, this regardless if it is zero-copy mode or not. You are
> >>    correct that this could be supported in copy-mode, but it is not.
> >>
> >>    > Again, thank you very much for reading this and your help.
> >>    >
> >>    > Rio
> >>    >
> >>    > >    > Thank you very much for your help and time.
> >>    > >    > Rio
> >>    > >    >
> >>    > >    >
> >>    > >    > From: Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx>
> >>    > >    > Date: Monday, December 12, 2022 at 11:06 AM
> >>    > >   > To: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
> >>    > >    > Subject: Re: Is It Possible to RX/Process/TX packets concurrently with AF_XDP?
> >>    > >    >
> >>    > >    > Got it, thank you very much for your clarification.
> >>    > >    >
> >>    > >    > I have one more question, if I may: If one AF_XDP should be handled by one thread, in order to avoid mutexes and to achieve better performance, then, can I have more than one AF_XDP socket on the same physical NIC, and use one thread per AF_XDP socket, in order to make process packets coming into this NIC concurrently?
> >>    > >    >
> >>    > >   > Currently, the way we are testing AF_XDP with is to have only 1 queue:
> >>    > >    >
> >>    > >    > ```
> >>    > >    > sudo ethtool -L <interface> combined 1
> >>    > >    > ```
> >>    > >    >
> >>    > >    > Can I change the number of queues to something like 4, and the user space program,  have one AF_XDP socket per queue and one thread per AF_XDP socket, in order to have four threads processing traffic coming into the same NIC?
> >>    > >    >
> >>    > >    > Thank you very much for your help and time.
> >>    > >    > Rio
>




[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux