On Thu, Dec 22, 2022 at 12:11 AM Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> wrote: > > > > On Tue, Dec 13, 2022 at 8:11 PM Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> wrote: > > > > > > It looks like that I didn’t include the mailing list in my previous replies. I hope this one does. > > > > > > Also, for the AF_XDP-forwarding example, is it able to handle multiple AF_XDP sockets on the same NIC? > > > > Yes. > > > > > Such as: > > > > > > ``` > > > ./xdp_fwd -i IFA -q Q1 -i IFA -q Q2 -i IFA -q Q3 -i IFA -q Q4 -c CX -c CY > > > > > > ``` > > > > > > If the above is doable, maybe I can have multiple queues, rather than having one, on the same NIC, create one AF_XDP socket per queue, and then use this xdp_fwd example to achieve multi-threading? > > > > That is the best way to do multithreading without having to resort to > > expensive locking. One queue and socket per thread is the way to go. > > > > Thank you so much for your reply and suggestions. I made changes following your suggestions and it partly worked! > > At the beginning, I copied the round robin logic of the xdpsock_kern.c and put it in my XDP code, in the user space, I have the number_of_receving_queues of threads, which equals to the number of cores of my machine(24), each have an AF_XDP socket for that queue, but none of them are receiving packets. I added some logs to the rx_burst and found out that they were all busy polling and the n_packets are always 0. I later changed the number of queues and threads to 4, and the results are the same. What could be the reason that the round robin doesn't work? > > Then, I set the number of receiving queues to. 8, removed the round robin logic in the XDP code, simply forward the packet to the ctx->rx_ueue_index of the xsks_map; and used 8 threads in the userspace for the 8 queues, and it is now able to receive packets. However, when I tried to increase the number of queue and threads to 16, none of the AF_XDP socket can receive packets again. Do you know what might be cause this? Is it because the number of queues and threads are too many? There is not upper limit to the amount of queues and threads supported in the AF_XDP code. Your NIC will likely have a limit on the amount of queues though. > Another questions is, since I'm not using round robin in my XDP code, the traffic isn't distributed evenly among my queues, it seems to me that 2 queues are always getting most of the traffic and the others are getting very little: > > +------+--------------+---------------+--------------+---------------+ > | Port | RX packets | RX rate (pps) | TX packets | TX_rate (pps) | > +------+--------------+---------------+--------------+---------------+ > | 0 | 2113 | 1 | 2113 | 1 | > | 1 | 0 | 0 | 0 | 0 | > | 2 | 0 | 0 | 0 | 0 | > | 3 | 568 | 0 | 568 | 0 | > | 4 | 2590 | 1 | 2590 | 1 | > | 5 | 0 | 0 | 0 | 0 | > | 6 | 0 | 0 | 0 | 0 | > | 7 | 85 | 0 | 85 | 0 | > +------+--------------+---------------+--------------+---------------+ > > I understand this is mainly because I don't have round robin in my XDP code, but I wonder what decides which queue gets the traffic? Also, if round robin works, does it mean that when a packet arrives in the XDP in queue x, and then be forwarded to an AF_XDP socket with queue y, the packet will be copied, and zero-copy won't work in this case? Your packet distribution among queues is decided by your NIC and the traffic it receives. It probably has RSS enabled by default. You can program the NIC flow steering rules using ethtool. If you want something perfectly spread among the cores, you probably want to have a synthetic workload and enable explicit flow steering rules to achieve perfect control. Google some examples and experiment without using XDP, is my tip. You cannot direct packets coming in on queue X to a socket bound to queue Y, this regardless if it is zero-copy mode or not. You are correct that this could be supported in copy-mode, but it is not. > Again, thank you very much for reading this and your help. > > Rio > > > > Thank you very much for your help and time. > > > Rio > > > > > > > > > From: Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> > > > Date: Monday, December 12, 2022 at 11:06 AM > > > To: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > > > Subject: Re: Is It Possible to RX/Process/TX packets concurrently with AF_XDP? > > > > > > Got it, thank you very much for your clarification. > > > > > > I have one more question, if I may: If one AF_XDP should be handled by one thread, in order to avoid mutexes and to achieve better performance, then, can I have more than one AF_XDP socket on the same physical NIC, and use one thread per AF_XDP socket, in order to make process packets coming into this NIC concurrently? > > > > > > Currently, the way we are testing AF_XDP with is to have only 1 queue: > > > > > > ``` > > > sudo ethtool -L <interface> combined 1 > > > ``` > > > > > > Can I change the number of queues to something like 4, and the user space program, have one AF_XDP socket per queue and one thread per AF_XDP socket, in order to have four threads processing traffic coming into the same NIC? > > > > > > Thank you very much for your help and time. > > > Rio > > > > > > > > > From: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > > > Date: Saturday, December 10, 2022 at 6:57 AM > > > To: Zhaoxi Zhu <zzhu@xxxxxxxxxxxxx> > > > Subject: Re: Is It Possible to RX/Process/TX packets concurrently with AF_XDP? > > > > > > No, that is not possible without expensive mutual exclusion mechanisms. Uae one socket per thread instead. > > > > > > Magnus > > > Le ven. 9 déc. 2022, 23:49, Zhaoxi Zhu <mailto:zzhu@xxxxxxxxxxxxx> a écrit : > > > Hi Magnust, > > > > > > Thank you very much for your reply and the link you provided. > > > > > > Do you think it is okay to have multiple threads for the same AF_XDP socket? In the AF_XDP-forwarding example, it seems like that the same AF_XDP socket is only handled by one thread. I wonder if that's okay for this AF_XDP-forward program to run like: > > > > > > ``` > > > ./xdp_fwd -i IFA -q QA -c CX -c CY > > > ``` > > > > > > So that we have two threads running on the same AF_XDP socket. > > > > > > Thank you again for your help. > > > Rio Zhu > > > > > > On 12/8/22, 2:54 AM, "Magnus Karlsson" <mailto:magnus.karlsson@xxxxxxxxx> wrote: > > > > > > On Wed, Dec 7, 2022 at 11:54 PM Zhaoxi Zhu <mailto:zzhu@xxxxxxxxxxxxx> wrote: > > > > > > > > Hi All, > > > > > > > > Thank you very much for reading this email. My name is Rio. > > > > > > > > I recently started looking into the XDP technology, especially AF_XDP, and I really love it. I started studying and modifying this AF_XDP example(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxdp-project%2Fxdp-tutorial%2Fblob%2Fmaster%2Fadvanced03-AF_XDP%2Faf_xdp_user.c&data=05%7C01%7Czzhu%40futurewei.com%7Ceea3f6f57f7c420d4d2408dadda63b02%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C638066003311816136%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6lmj2amKpz%2FURa0yOlY9BjTb7lpRT%2B%2FRZfEVEUdHOA0%3D&reserved=0) to meet my need, and it has been working fine. > > > > > > > > However, one thing I notice is that this user space application is single threaded. I wonder if it is feasible to multi-threading to the RX/packet processing/TX parts of the program, in order to utilize other cores and possibly make my application faster? > > > > > > > > > > Please check out the AF_XDP-forwarding example in this repo: > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxdp-project%2Fbpf-examples&data=05%7C01%7Czzhu%40futurewei.com%7Ceea3f6f57f7c420d4d2408dadda63b02%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C638066003311816136%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MihUfPys5vhGnNrQktOUyhiDn1nukOE8rfgS3rXUGaQ%3D&reserved=0 > > > > > > > One challenge I face now is, as I tried different places to add multi-threading, the program does not work properly. Symptoms such as `assert(xsk->umem_frame_free < NUM_FRAMES);` failed; ICMP ping packets doesn’t arrive destination until seconds later and TCP connection generated by `iperf` cannot be established occurs as I tried different places to add multi-threading to my code. > > > > > > > > So, my question is, does AF_XDP applications support multi-threading during RX/packet processing/TX? If so, what is a proper way to modify the AF_XDP example code to let it run properly? > > > > > > > > Thank you again for reading this email. I look forward to hearing from you. > > > > > > > > Best, > > > > Rio Zhu > > > > > > > > > > > >