On Thu, Dec 29, 2022 at 2:07 AM Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> wrote: > > > > >> On 12/22/22, 1:38 AM, "Magnus Karlsson" <magnus.karlsson@xxxxxxxxx> wrote: > >> > >> On Thu, Dec 22, 2022 at 12:11 AM Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> wrote: > >> > > >> > > >> > > On Tue, Dec 13, 2022 at 8:11 PM Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> wrote: > >> > > > > >> > > > It looks like that I didn’t include the mailing list in my previous replies. I hope this one does. > >> > > > > >> > > > Also, for the AF_XDP-forwarding example, is it able to handle multiple AF_XDP sockets on the same NIC? > >> > > > >> > > Yes. > >> > > > >> > > > Such as: > >> > > > > >> > > > ``` > >> > > > ./xdp_fwd -i IFA -q Q1 -i IFA -q Q2 -i IFA -q Q3 -i IFA -q Q4 -c CX -c CY > >> > > > > >> > > > ``` > >> > > > > >> > > > If the above is doable, maybe I can have multiple queues, rather than having one, on the same NIC, create one AF_XDP socket per queue, and then use this xdp_fwd example to achieve multi-threading? > >> > > > >> > > That is the best way to do multithreading without having to resort to > >> > > expensive locking. One queue and socket per thread is the way to go. > >> > > > >> > > >> > Thank you so much for your reply and suggestions. I made changes following your suggestions and it partly worked! > >> > > >> > At the beginning, I copied the round robin logic of the xdpsock_kern.c and put it in my XDP code, in the user space, I have the number_of_receving_queues of threads, which equals to the number of cores of my machine(24), each have an AF_XDP socket for that queue, but none of them are receiving packets. I added some logs to the rx_burst and found out that they were all busy polling and the n_packets are always 0. I later changed the number of queues and threads to 4, and the results are the same. What could be the reason that the round robin doesn't work? > >> > > >> > Then, I set the number of receiving queues to. 8, removed the round robin logic in the XDP code, simply forward the packet to the ctx->rx_ueue_index of the xsks_map; and used 8 threads in the userspace for the 8 queues, and it is now able to receive packets. However, when I tried to increase the number of queue and threads to 16, none of the AF_XDP socket can receive packets again. Do you know what might be cause this? Is it because the number of queues and threads are too many? > >> > >> There is not upper limit to the amount of queues and threads supported > >> in the AF_XDP code. Your NIC will likely have a limit on the amount of > >> queues though. > >> > > Thank you very much for your tips. After some debugging, I found out that it is because of some bugs initiating the bcache, which results in ports' pcache trading new slabs from the bpool. Please submit this fix to the bpf-examples repo so other people can benefit from it too. > I fixed that and continued testing, my test is to run some iperfs to generate traffic that goes through these AF_XDP sockets. > > There is one interesting behavior: the iperf traffic runs fine at the beginning. However, after some time, maybe 30 seconds or 1 minute, the traffic stopped and I didn't see any sockets rx/tx packets(I added some logs there for debugging). > > The program didn't crash, it is as if there's no new traffic coming to the sockets. However, if I stop the iperfs and give the sockets a break, say 1 or 2 minutes, and restart the iperfs, the sockets are usually able to rx/tx traffic again, until it doesn't. > > I wonder have you seen anything like this before? It feels like that I'm very close to having a fully running program, but I'm stuck at this step and I can't figure out why. Have not seen anything like this. TCP is sensitive to packet loss. Are packets dropped in your test run? > Thanks again for your help. > > >> > Another questions is, since I'm not using round robin in my XDP code, the traffic isn't distributed evenly among my queues, it seems to me that 2 queues are always getting most of the traffic and the others are getting very little: > >> > > >> > +------+--------------+---------------+--------------+---------------+ > >> > | Port | RX packets | RX rate (pps) | TX packets | TX_rate (pps) | > >> > +------+--------------+---------------+--------------+---------------+ > >> > | 0 | 2113 | 1 | 2113 | 1 | > >> > | 1 | 0 | 0 | 0 | 0 | > >> > | 2 | 0 | 0 | 0 | 0 | > >> > | 3 | 568 | 0 | 568 | 0 | > >> > | 4 | 2590 | 1 | 2590 | 1 | > >> > | 5 | 0 | 0 | 0 | 0 | > >> > | 6 | 0 | 0 | 0 | 0 | > >> > | 7 | 85 | 0 | 85 | 0 | > >> > +------+--------------+---------------+--------------+---------------+ > >> > > >> > I understand this is mainly because I don't have round robin in my XDP code, but I wonder what decides which queue gets the traffic? Also, if round robin works, does it mean that when a packet arrives in the XDP in queue x, and then be forwarded to an AF_XDP socket with queue y, the packet will be copied, and zero-copy won't work in this case? > >> > >> Your packet distribution among queues is decided by your NIC and the > >> traffic it receives. It probably has RSS enabled by default. You can > >> program the NIC flow steering rules using ethtool. If you want > >> something perfectly spread among the cores, you probably want to have > >> a synthetic workload and enable explicit flow steering rules to > >> achieve perfect control. Google some examples and experiment without > >> using XDP, is my tip. > >> > >> You cannot direct packets coming in on queue X to a socket bound to > >> queue Y, this regardless if it is zero-copy mode or not. You are > >> correct that this could be supported in copy-mode, but it is not. > >> > >> > Again, thank you very much for reading this and your help. > >> > > >> > Rio > >> > > >> > > > Thank you very much for your help and time. > >> > > > Rio > >> > > > > >> > > > > >> > > > From: Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> > >> > > > Date: Monday, December 12, 2022 at 11:06 AM > >> > > > To: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > >> > > > Subject: Re: Is It Possible to RX/Process/TX packets concurrently with AF_XDP? > >> > > > > >> > > > Got it, thank you very much for your clarification. > >> > > > > >> > > > I have one more question, if I may: If one AF_XDP should be handled by one thread, in order to avoid mutexes and to achieve better performance, then, can I have more than one AF_XDP socket on the same physical NIC, and use one thread per AF_XDP socket, in order to make process packets coming into this NIC concurrently? > >> > > > > >> > > > Currently, the way we are testing AF_XDP with is to have only 1 queue: > >> > > > > >> > > > ``` > >> > > > sudo ethtool -L <interface> combined 1 > >> > > > ``` > >> > > > > >> > > > Can I change the number of queues to something like 4, and the user space program, have one AF_XDP socket per queue and one thread per AF_XDP socket, in order to have four threads processing traffic coming into the same NIC? > >> > > > > >> > > > Thank you very much for your help and time. > >> > > > Rio >