Hi Marek, Answer some of your questions below, I leave the rest for others. On Fri, Aug 23, 2019 at 3:38 AM Marek Závodský <marek.zavodsky@xxxxxxxxxxxxx> wrote: > > Hi Jasper, > > > Thanks for your reply. > > I apologize, I'm new to kernel dev, so I may be missing some background. > > > Let's bring some more light into this. We are using kernel 5.0.0 and samples/bpf/xdpsock as an example. Do you want to consider using AF_XDP API from libbpf? The samples/bpf/xdpsock_user.c in 5.0.0 still not uses libbpf https://elixir.bootlin.com/linux/v5.0/source/samples/bpf/xdpsock_user.c kernel 5.1 xdpsock uses libbpf https://elixir.bootlin.com/linux/v5.1/source/samples/bpf/xdpsock_user.c > > I checked master, and example evolved (e.g. by adding cleanup mechanisms), but in terms what I need of it, it looks equal (and even more complicated, because now XDP attaching to interface is interleaved with XSK allocation). > > I built latest kernel, but it refused to boot, so I haven't had chance yet to tray the latest. Recently there are some fixes, I would suggest using the latest one. > > > I took the _user part and split it into two: > > "loader" - Executed once to setup environment and once to cleanup, loads _kern.o, attaches it to interface and pin maps under /sys/fs/bpf. > > and > > "worker" - Executed as many as required. Every instance loads maps from /sys/fs/bpf, create one AF_XDP sock, update xsks record and start listen/process packets from AF_XDP (in test scenario we are using l2fwd because of write-back). I had to add missing cleanups there( close(fd), munmap()). This should be vpp in final solution. > > So far so good. > > I'm unable to start more than one worker due to previously mentioned error. First instance works properly, every other fails on bind (lineno may not match due to local changes): > > xdpsock_user.c:xsk_configure:595: Assertion failed: bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0: errno: 16/"Device or resource busy" > > I don't think you can have multiple threads binding one XSK, see xsk_bind() in kernel source. For AF_XDP in OVS, we create multiple XSKs, non-shared umem and each has its thread. > I modified it to allocate multiple sockets within one process, and I was successful with shared umem: > > num_socks = 0; > > xsks[num_socks++] = xsk_configure(NULL); > for (; num_socks < opt_alloc; num_socks++) > xsks[num_socks] = xsk_configure(xsks[0]->umem); > > > but got same behavior (first ok, second failed on bind) when tried non-shared: > > num_socks = 0; > > for (; num_socks < opt_alloc; num_socks++) > > xsks[num_socks] = xsk_configure(NULL); > I never try shared-umem, I would suggest start with non-shared case. Regards, William > > And the TX processing... as a workaround we moved VLAN pop/push to "worker" and XDP does only xsk-map redirects based on vlan-id, but it violates the design. It there any estimate when we could expect something on XDP TX front? I can try BPF TC TX meantime. > > > I guess changing opt_ifindex to xsk->fd in bpf_set_link_xdp_fd(opt_ifindex, prog_fd, opt_xdp_flags); > > won't help :) > > > One side question. I noticed that bpf_trace_printk creates sparse entries in /sys/kernel/debug/tracing/trace. > > When I run sample of 100 packets I may get 0 to many entries there. I't a bit annoying to run "load test" just to verify I hit the correct code path. Is it doing sampling? Can I tweak it somehow? > > > Thanks, > > marek > > ________________________________ > From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > Sent: Friday, August 23, 2019 10:22:24 AM > To: brouer@xxxxxxxxxx; Július Milan; Marek Závodský > Cc: xdp-newbies@xxxxxxxxxxxxxxx; Karlsson, Magnus; Björn Töpel; Eelco Chaudron; Thomas F Herbert; William Tu > Subject: AF_XDP integration with FDio VPP? (Was: Questions about XDP) > > > Bringing these questions to the xdp-newbies list, where they belong. > Answers inlined below. > > On Tue, 20 Aug 2019 21:17:57 +0200 Július Milan <Julius.Milan@xxxxxxxxxxxxx> > > > > I am writing AF_XDP driver for FDio VPP. I have 2 questions. > > > > That sounds excellent. I was hoping someone would do this for FDio VPP. > Do notice that DPDK now also got AF_XDP support. IHMO it makes a lot > of sense to implement AF_XDP for FDio, and avoid the DPDK dependency. > (AFAIK FDio already got other back-ends than DPDK). > > > > 1 - I created a simple driver according to sample in kernel. I load my XDP > > program and pin the maps. > > > > Then in user application I create a socket, mmap the memory and > > push it to xskmap in program. All fine yet. > > > > Then I start another instance of user application and do the > > same, create socket, mmap the memory and trying to > > > > push it somewhere else into the map. But I got errno: 16 > > "Device or resource busy" when trying to bind. > > > > I guess the memory can’t be mmaped 2 times, but should be > > shared, is that correct? > > I'm cc'ing the AF_XDP experts, as I'm not sure myself. I mostly deal > with the in-kernel XDP path. (AF_XDP is essentially kernel bypass :-O) > > > > If so, I am wondering how to solve this nicely. > > > > Can I store the value of first socket (that created the mmaped > > memory) in some special map in my XDP program to avoid complicated > > inter-process communication? > > > > And what happens if this first socket is closed while any other > > sockets are still alive (using its shared mmaped memory)? > > > > What would you recommend? Maybe you have some sample. > > We just added a sample (by Eelco Cc'ed) into XDP-tutorial: > https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP > > At-least read the README.org file... to get over the common gotchas. > > AFAIK the sample doesn't cover your use-case. I guess, we/someone > should extend the sample, to illustrate how how multiple interfaces can > share the same UMEM. > > The official documentation is: > https://www.kernel.org/doc/html/latest/networking/af_xdp.html > > > > Can I do also atomic operations? (I want it just for such rare > > cases as initialization of next socket, to check if there already is > > one, that mmaped the memory) > > > > > > > > 2 – We want to do also some decap/encap on XDP layer, before > > redirecting it to the socket. > > > > Decap on XDP layer is an excellent use-case, that demonstrate > cooperation between XDP and AF_XDP kernel-bypass facility. > > > > On RX way it is easy, I do what I want and redirect it to the > > socket, but can I achieve the same also on TX? > > > > (Yes, RX case is easy) > > We don't have an XDP TX hook yet... but so many people have requested > this, that we should add this. > > > Can I catch the packet while TX in XDP and do something with it > > (encapsulate it) before sending it out? > > Usually, we recommend people use the TC egress BPF hook to do the encap > in TX. For the AF_XDP use-case, the TC hook isn't there... so that is > not an option. Again an argument for an XDP-TX hook. You, could > of-cause add the encap header in your AF_XDP userspace program, but I > do understand it would make architectural sense that in-kernel XDP > would act as a decap/encap layer. > > > > If so what about performance? > > > > For AF_XDP RX-side is really really fast, even in copy-mode. > > For AF_XDP TX-side in copy-mode, it is rather slow, as it allocates > SKBs etc. We could optimize this further but we have not. When > enabling AF_XDP zero-copy mode, the TX-side is also super fast. > > Another hint, for AF_XDP TX-side, remember to "produce" several packets > before doing the sendmsg system call. Thus, effectively doing bulking > on the TX-ring. > > > > > > By the way, great job with XDP ;) > > Thanks! > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer