On Thu, Mar 26, 2020 at 1:30 PM Gaul, Maximilian <maximilian.gaul@xxxxxx> wrote: > > On Wed, Mar 25, 2020 at 14:36 AM Karlsson, Magnus > magnus.karlsson@xxxxxxxxx> wrote: > > > On Wed, Mar 25, 2020 at 1:40 PM Gaul, Maximilian <maximilian.gaul@xxxxxx> wrote: > > > > > > On Wed, Mar 25, 2020 at 12:04 AM Karlsson, Magnus > > > magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > On Wed, Mar 25, 2020 at 11:45 AM Gaul, Maximilian > > > > <maximilian.gaul@xxxxxx> wrote: > > > > > > > > > > On Wed, Mar 25, 2020 at 11:24 AM Karlsson, Magnus > > > > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:02 AM Gaul, Maximilian > > > > > > <maximilian.gaul@xxxxxx> wrote: > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 10:41 AM Karlsson, Magnus > > > > > > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 10:04 AM Gaul, Maximilian > > > > > > > > <maximilian.gaul@xxxxxx> wrote: > > > > > > > > > > > > > > > > > > I am running a Multi-AF-XDP-Socket approach per RX-Queue (using Shared Umem). > > > > > > > > > > > > > > > > > > Unfortunately I am noticing, that at around 650k pps, the *ksoftirqd*-thread of that RX-Queue ramps up to 100% thus leading to packet loss. > > > > > > > > > I tried setting *XDP_USE_NEED_WAKEUP* on *xsk_socket_cfg.bind_flags* but those bind_flags are only taken into account if *umem->refcount > 1* (libbpf/xsk.c - xsk_socket__create()). > > > > > > > > > As far as I understand this correctly, only the first socket is able to set *XDP_USE_NEED_WAKEUP* because for all sockets after, *umem->refcount* is going to be at least 2. > > > > > > > > > > > > > > > > Yes, the other sockets just inherit the settings of the first one. > > > > > > > > > > > > > > > > Are you using the SKB mode? What is your packet size? Sounds like a > > > > > > > > low number unless you have large packets and are using the SKB mode. > > > > > > > > > > > > > > > > > > > > > > These are the flags I set right before calling `xsk_socket__create`: > > > > > > > > > > > > > > xsk_socket_cfg.xdp_flags = cfg->xdp_flags | XDP_FLAGS_DRV_MODE | XDP_ZEROCOPY; > > > > > > > xsk_socket_cfg.bind_flags = cfg->xsk_bind_flags | XDP_USE_NEED_WAKEUP; > > > > > > > > > > > > XDP_ZEROCOPY is a bind flag not an XDP flag, so please move it there. > > > > > > If you get an error when you have it set, it means that your setup > > > > > > does not support zero-copy for some reason. Check what kernel version > > > > > > you are using and the the driver you are using supports zero-copy. I > > > > > > believe you need to use a queue id>=32 in the Mellanox driver for it > > > > > > to work in zero-copy mode. Below 32, you will get copy mode. > > > > > > > > > > > > > Packet size is around 1492 bytes. > > > > > > > > > > > > Seems that you are using SKB mode then, not zero-copy. > > > > > > > > > > > > /Magnus > > > > > > > > > > > > > > > > Thank you for the hint. As you correctly said, I get an error if I use *XDP_ZEROCOPY*. But as far as I understand, packet rates should be higher in Driver-Mode even without zero-copy? > > > > > > > > Yes, I would expect that too. > > > > > > > > > I just updated to the latest driver- and firmware version: > > > > > > > > > > $ sudo ethtool -i <if> > > > > > driver: mlx5_core > > > > > version: 5.0-0 > > > > > firmware-version: 16.27.1016 (MT_0000000012) > > > > > > > > What kernel version are you using? And you should use the driver from > > > > that same kernel. > > > > > > > > > > I am using > > > > > > $ uname -a > > > Linux 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64 GNU/Linux > > > > > > At the moment, Mellanox only supports Debian until version 10.0 (https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) which is Kernel 4.19. > > > But because in Kernel 4.19, not all AF-XDP features are available, I had to upgrade. I was not sure which Kernel-Version would be the minimum in order to be able to use AF-XDP completely so I went with 5.4. > > > Installation was successfull (with *--skip-distro-check*) so I thought this should work? > > > > You better contact somebody from Mellanox for this info. I do not > > know. But Mellanox has zero-copy support in kernel 5.4. > > > > > > > I actually have to correct myself: Incomming packets are 1442 bytes. > > > > > Can you give me the link between packet size and whether the NIC is running in SKB or DRV mode? > > > > > > > > Sorry, do not understand this. Could you please elaborate? > > > > > > > > > > You answered to my reply that packets are 1492 bytes "Seems that you are using SKB mode then, not zero-copy." so because of this I thought there is a relation between packet size and SKB mode? > > > > There is no relationship between SKB mode and packet size. They are > > orthogonal. Though there is a relationship between packet size and > > performance and of course SKB mode vs zero-copy mode and performance. > > > > > > > Mr. Brouer held a talk about (https://people.netfilter.org/hawk/presentations/driving-IT2017/driving-IT-2017_XDP_eBPF_technology_Jesper_Brouer.pdf) about XDP, mentioning in slide 11/27 that *mlx5 (v4.9)* has native XDP support. > > > > > > > > Yes, but only if you use queue id>= 32. What queue id are you binding to? > > > > > > > > > > Usually it is queue 0 but I also tried queue 32 and queue 36 - but that didn't change anything on the behavior. > > > > It should make a difference, if I remember correctly, but somebody > > from Mellanox certainly knows better. Try sending a mail to Maxim who > > wrote the Mellanox driver support. Maxim Mikityanskiy > > <maximmi@xxxxxxxxxxxx>. > > > > /Magnus > > > > Thank you! > Just one more question regarding *ksoftirqd*-load. This paper seems to talk about the mentioned issue: http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf that *ksoftirqd* is producing high load. A stated solution to this problem would be *busy polling*. I am not sure if *busy polling* is something I have to assign via *setsockopt* in my userspace program or if this already taken care of by libbpf? > Nevertheless, I tried it like this: > > $ cat /boot/config-5.4.0-4-amd64 | grep "CONFIG_NET_RX_BUSY_POLL" > CONFIG_NET_RX_BUSY_POLL=y > > $ sysctl net.core.busy_poll=50 > > and in my user-space program: > > int busy_poll_usec = 50; > if(setsockopt(xsk_socket__fd(xsk_sockets[i]->xsk), SOL_SOCKET, SO_BUSY_POLL, (char *)&busy_poll_usec, sizeof(busy_poll_usec)) < 0) { > fprintf(stderr, "Failed to set `SO_BUSY_POLL`: %s\n", strerror(errno)); > break; > } > > /* some code inbetween */ > > while(!global_exit) { > const int ret = poll(fds, fd_count, 2); > if (ret <= 0) { > continue; > } > > for(int i = 0; i < fd_count; i++) { > struct xsk_socket_info *socket = xsk_sockets[i]; > if(atomic_exchange(&socket->stats_sync.lock, 1) == 0) { > handle_receive_packets(socket); > atomic_fetch_xor(&socket->stats_sync.lock, 1); /* release socket-lock */ > } > } > } > > but this only has the effect that my userspace program is now at 50% load (previously around 13%) with *ksoftirqd* still running at 100% (and even worse pps). > Is this the expected effect on what I did? Busy-polling will improve your latency (throughput might go either way) at the cost of higher CPU load. Note that busy-polling is NOT supported with AF_XDP. I sent out an RFC for this a year ago but ended up implementing something else (the need_wakeup flag) that solved the problem I had in in a better way. But it would still be beneficial to support busy-poll for AF_XDP, but I am not working on it or planning on doing it. /Magnus > > > > /Magnus > > > > > > > > > > > > > > > > Just to make sure: Those 650k packets are arriving on the same RX-Queue (even though this NIC has multiple RX-Queues I want to test maximum bandwith for a single RX-Queue). > > > > > > > > > > > > > > > > I didn't observe a dramatic change as I've hoped to. Are there some other ways to reduce interrupt load (user-space application and ksoftirq are already running on different CPUs)? > > > > > > > > > > > > > > > > The need_wakeup flag has a big impact when you run the softirq and the > > > > > > > > application thread on the same core. When using two cores for this, it > > > > > > > > has less of an impact. > > > > > > > > > > > > > > > > /Magnus > > > > > > > > > > > > > > > > > NIC: Mellanox Technologies MT27800 > > > > > > > > > > > > > > > > > > Best regards > > > > > > > > > > > > > > > > > > Max > > > > > > > > > > > > > > > > > > > > >