Hi Ethy, > 8) Any packet ARRIVING thru middle's WAN (eth1) interface has its VLAN > header removed with XDP loaded into kernel - observed with tcpdump as bellow Have you disabled vlan rx offloading on your nic, when trying XDP? When vlan rx offloading is enabled, nic cuts off vlan header from packets and sends them separately as metadata. When a packet is processed by the kernel, this metadata is saved into `struct sk_buff->vlan_tci`, xdp hooks happen before that and don't work with skbs. But for the time being there is no uniform interface to address such metadata from XDP, and all XDP can see is a packet with vlan headers stripped. I believe you need to switch off vlan rx offload (`ethtool -K ${ifname} rxvlan off `) to get proper behaviour. Best regards, Ivan Koveshnikov On Mon, 26 Jul 2021 at 06:17, Ethy H. Brito <ethy.brito@xxxxxxxxxxxx> wrote: > > > Hi everyone. > > (Long and very verbose email follows. Sorry about that - be patient) > > I can't make xdp-cpumap-tc work if vlan is used at WAN interface. > > If the packet gets redirected , that is, if it hits > > "return bpf_redirect_map(&cpu_map, cpu_dest, 0);" > > in xdp_iphash_to_cpu_kern (function parse_ipv4) the packet never arrives > at client. It gets dropped somewhere. > > The test setup comprises three boxes: > 1) a client - vanila ubuntu 20 > 2) a middle router box in-between (1) and (3) (that runs XDP and tc_classify) > 3) a server - vanila ubuntu 20 > > They are almost completely isolated from production environment. > > A vlan (nic-br) is set between (2) and (3). > > xdp-cpumap-tc was git download with > > git clone --recurse-submodules https://github.com/xdp-project/xdp-cpumap-tc > > and compiled yesterday. No errors. > > The VLANs are created like that: > at middle box: > #ip link add link eth1 name nic-br type vlan id 1003 > at server box > #ip link add link eth0 name nic-br type vlan id 1003 > The routes are > at middle box: > # ip r sh > 10.16.239.0/24 dev eth0 scope link > 187.17.36.69 dev eth0 scope link > 192.168.1.0/24 dev nic-br proto kernel scope link src 192.168.1.1 > > at server box: > # ip r s > default via 192.168.1.1 dev nic-br > 10.16.239.0/24 via 192.168.1.1 dev nic-br > 192.168.1.0/24 dev nic-br proto kernel scope link src 192.168.1.2 > > Client box has two IP addresses configured at its only interface (eth0) > inet 10.16.239.213/32 scope global client > inet 187.17.36.69/32 scope global client > > Both IPs "pings" server ip address 192.168.1.2 thru "middle" when XDP is *OFF*. > > # ping -I 187.17.36.69 192.168.1.2 > PING 192.168.1.2 (192.168.1.2) from 187.17.36.69 : 56(84) bytes of data. > 64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.391 ms > 64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.360 ms > > # ping -I 10.16.239.213 192.168.1.2 > PING 192.168.1.2 (192.168.1.2) from 10.16.239.213 : 56(84) bytes of data. > 64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.410 ms > 64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.389 ms > > But when the script bellow is executed at "middle", It is not possible > to ping the server from IP 187.17.36.69 anymore. > 10.16.239.213 works Ok. > > As you will see bellow I only "mapped" the 187... address. > > Since I have no clue where to investigate, I watched all I could think of. > These are my observations up to now: > > 1) if the packet is redirected to some CPU, it disappear inside > the kernel never hitting the client. > > 2) if I unconditionally returns XDP_PASS at the end of parse_ipv4 > (xdp_iphash_to_cpu_kern) both pings work. > > 3) If I comment out the last line of the script bellow (the mapping line), > then flush XDP and TC, and run the script again, both pings work. > (since I do not have a map hit and the cpu redirect never occurs) > > 4) If I kill the vlans and route the packets thru the "naked" eth's > both pings work. (no need to reload XDP or tc_classify - it just works right away) > > 5) I put some bpf_debug messages at the VLAN detection code, at both > xdp_iphash and tc_classify, and they are both never hit. > > 6) locally, at middle box, I can always ping 187.17.36.69 and 10.16.239.213 > (even with XDP *ON*) > > 7) If I execute: > /usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.255' --classid '1:fffd' --cpu 0 > pings from 10.16.239.213 stop immediately (since now these IP packets gets redirected to a CPU). > Deleting the IP entry from map, restores the ping immediately. > > 8) Any packet ARRIVING thru middle's WAN (eth1) interface has its VLAN > header removed with XDP loaded into kernel - observed with tcpdump as bellow: > > Dump with XDP *OFF* (VLAN header OK - packet make thru client) > # stdbuf -o 0 -e 0 tcpdump -nei eth1 icmp > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes > > 19:32:45.676423 00:1b:21:8d:72:99 > 00:16:3e:f4:e9:88, ethertype *802.1Q (0x8100)*, > length 102: vlan 1003, p 0, ethertype IPv4, 187.17.36.69 > 192.168.1.2: ICMP echo request, id 1163, seq 137, length 64 > > 19:32:45.676563 00:16:3e:f4:e9:88 > 00:1b:21:8d:72:99, ethertype *802.1Q (0x8100)*, > length 102: vlan 1003, p 0, ethertype IPv4, 192.168.1.2 > 187.17.36.69: ICMP echo reply, id 1163, seq 137, length 64 > > > Dump with XDP *ON* (NO VLAN header - no packet get out thru middle's LAN (eth0)) > # stdbuf -o 0 -e 0 tcpdump -nei eth1 icmp > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes > > 19:30:43.852543 00:1b:21:8d:72:99 > 00:16:3e:f4:e9:88, ethertype *802.1Q (0x8100)*, > length 102: vlan 1003, p 0, ethertype IPv4, 187.17.36.69 > 192.168.1.2: ICMP echo request, id 1163, seq 18, length 64 > > 19:30:43.852695 00:16:3e:f4:e9:88 > 00:1b:21:8d:72:99, ethertype *IPv4 (0x0800)*, > length 98: 192.168.1.2 > 187.17.36.69: ICMP echo reply, id 1163, seq 18, length 64 > > This is bpftool output: > # bpftool net > xdp: > eth0(3) driver id 796 > eth1(4) driver id 801 > > tc: > eth0(3) clsact/egress tc_classify_kern.o:[tc_classify] id 797 > eth1(4) clsact/egress tc_classify_kern.o:[tc_classify] id 802 > > flow_dissector: > > > This is the script I use to start xdp-cpumap. It is a fragment form a much > larger script that runs at my production box, stripped to a bare minimum where > the problem still happens. > > ----------------------------------- 8< ------------------------------------------- > #!/bin/bash > > # Flushes all XDP maps > ################################################### > /usr/local/bin/xdp_iphash_to_cpu_cmdline --clear &>/dev/null > > /sbin/ip link set eth0 up > # Turn off eth0's XPS > for xps_cpus in $(ls /sys/class/net/eth0/queues/tx-*/xps_cpus | sort --field-separator='-' -k2n); do > echo 0 > $xps_cpus > done > > # remove any existing qdiscs > /sbin/tc qdisc del dev eth0 root 2> /dev/null > /sbin/tc qdisc del dev eth0 ingress 2> /dev/null > > # Multiqueue root discipline, handle 7fff: handle > /sbin/tc qdisc replace dev eth0 root handle 7FFF: mq > > #Cria as disciplinas de fila (filhas de cada qdisc MQ acima) para atrelar (mais tarde) a cada CPU. > i=0 > for dir in /sys/class/net/eth0/queues/tx-* ; do > x=$((i++)) > # Qdisc HTB $i: under parent 7FFF:$i > i_str=$(printf '%x' $i) > > # "root" class > /sbin/tc qdisc add dev eth0 parent 7FFF:$i_str handle $i_str: hfsc default fffd > > # inner classes > /sbin/tc class add dev eth0 parent $i_str: classid $i_str:1 hfsc sc m2 7gibit ul rate 7gibit > # - set default class rate > /sbin/tc class add dev eth0 parent $i_str:1 classid $i_str:fffd hfsc sc m2 7gibit ul rate 7gibit > # Change the qdisc on default class > /sbin/tc qdisc add dev eth0 parent $i_str:fffd fq_codel > done > > # Load XDP module > /usr/local/bin/xdp_iphash_to_cpu --dev eth0 --all-cpus --wan --quiet &>/dev/null > # Put all CPUs to dance. CPU X will process queue X+1 that holds the X+1 class > /usr/local/bin/tc_classify --dev-egress eth0 --base-setup --quiet &>/dev/null > > > /sbin/ip link set eth1 up > # Turn off eth1's XPS > for xps_cpus in $(ls /sys/class/net/eth1/queues/tx-*/xps_cpus | sort --field-separator='-' -k2n); do > echo 0 > $xps_cpus > done > > # remove any existing qdiscs > /sbin/tc qdisc del dev eth1 root 2> /dev/null > /sbin/tc qdisc del dev eth1 ingress 2> /dev/null > > # Multiqueue root discipline, handle 7fff: handle > /sbin/tc qdisc replace dev eth1 root handle 7FFF: mq > > #Cria as disciplinas de fila (filhas de cada qdisc MQ acima) para atrelar (mais tarde) a cada CPU. > i=0 > for dir in /sys/class/net/eth1/queues/tx-* ; do > x=$((i++)) > # Qdisc HTB $i: under parent 7FFF:$i > i_str=$(printf '%x' $i) > > # "root" class > /sbin/tc qdisc add dev eth1 parent 7FFF:$i_str handle $i_str: hfsc default fffd > > # inner classes > /sbin/tc class add dev eth1 parent $i_str: classid $i_str:1 hfsc sc m2 7gibit ul rate 7gibit > # - set default class rate > /sbin/tc class add dev eth1 parent $i_str:1 classid $i_str:fffd hfsc sc m2 7gibit ul rate 7gibit > # Change the qdisc on default class > /sbin/tc qdisc add dev eth1 parent $i_str:fffd fq_codel > done > > # Load XDP module > /usr/local/bin/xdp_iphash_to_cpu --dev eth1 --all-cpus --wan --quiet &>/dev/null > # Put all CPUs to dance. CPU X will process queue X+1 that holds the X+1 class > /usr/local/bin/tc_classify --dev-egress eth1 --base-setup --quiet &>/dev/null > > #/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.255' --classid '1:fffd' --cpu 0 >&/dev/null > #/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.254' --classid '1:fffe' --cpu 0 >&/dev/null > #/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.253' --classid '1:fffc' --cpu 0 >&/dev/null > > # Put the client's packets in its shaper > #from client: classid=4:105;VEL=52428800; > /sbin/tc class add dev eth1 parent 4:1 classid 4:105 hfsc sc m2 50mibit ul rate 50mibit > /sbin/tc qdisc add dev eth1 parent 4:105 fq_codel > > #to client: classid=4:105;VEL=52428800; > /sbin/tc class add dev eth0 parent 4:1 classid 4:105 hfsc sc m2 50mibit ul rate 50mibit > /sbin/tc qdisc add dev eth0 parent 4:105 fq_codel > > /usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '187.17.36.69' --classid '4:105' --cpu 3 >&/dev/null > > exit 0 > > > ----------------------------------- 8< ------------------------------------------- > > Some bad interaction is happening when I use XDP and VLANs together. > > Can you guys help me with this?? > > Regards > > Ethy