Hi Florian, Thank you for your detailed reply. Responses below: On Wed, Oct 25, 2023 at 02:32:32PM +0200, Florian Westphal wrote: > Duncan Roe <duncan_roe@xxxxxxxxxxxxxxx> wrote: > > My libnetfilter_queue application is unable to mangle UDP6 messages that have > > been fragmented. The kernel only delivers the first fragment of such a message > > to the application. > > > > Is this a permanent restriction or a bug? > > There is not enough information here to answer this question, > see below. > > > messages. "Something else" in the kernel re-combines UDP4 fragments before they > > are queued to my application, so they mangle OK. > > I'm not sure what you mean or what you expect to happen. I expected the netfilter program to see the full UDP datagram as sent. With UDP/IPv4 it does see the full datagram, but not with UDP/IPv6. > > > In summary: > > - GSO re-combines TCP fragments before tcpdump can see them. > > Do you mean "segments"? Its the other way around, with GSO/TSO, stack > builds large superpackes, one tcp header with lots of data. Sorry for the confusion here. I only meant to say that there is no problem with TCP. IOW kernel delivers to filter program exactly what was in the buffer when the remote application did a write(2) (for buffer size up to just under 64KB). I don't know what GSO is, only that it's strongly recommended to use it. > > Such superpackets are split at the last possible moment; > ideally by NIC/hardware. > > > - Some other kernel code re-combines UDP4 fragments before netfilter queues > > them > > - Some other different kernel code re-combines UDP6 fragments for the user > > application but after netfilter queues them > > - It's been this way for a number of years > > GSO is just the software fallback of TSO, i.e. local stack passes > large skb down to the driver which will do pseudo segmentation, > this needs hardware that can handle scatterlist, which is true for > almost all nics. > > There is some segmentation support for UDP to handle encapsulation > (tunneling) use cases, where stack can pass large skb and then can > have hardware or software fallback do the segmentation for us, i.e. > split according to inner protocol and add the outer udp encapsulation > to all packets. > > > ================ Testing with GSO > > > > nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 24 > > tcpdump cmd: tcpdump -i eth1 'ether host 18:60:24:bb:02:d6 && (tcp || udp) && > > ! port x11' > > > > > netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v > > > nfq6 output # tcpdump o/p (early fields omitted) > > > packet received (id=169 hw=0x86dd hook=1, payload len 1496) # frag (0|1448) 33020 > 1042: UDP, length 2048 > > > Packet too short to get UDP payload # > > > # frag (1448|608) > > You are sending a large udp packet via ipv6, it doesn't fit the device mtu, > fragmentation is needed. This has nothing to do with GSO. OK I was under the mis-apprehension GSO was a level 3 thing (working at IP level). > > > > packet received (id=176 hw=0x0800 hook=1, payload len 60) # > Flags [S], seq 821055799, win 64240, options [mss 1460,sackOK,TS val 3739788506 ecr 0,nop,wscale 7], length 0 > > > packet received (id=177 hw=0x0800 hook=3, payload len 60, checksum not ready) # < Flags [S.], seq 1085807033, ack 821055800, win 65160, options [mss 1460,sackOK,TS val 4164299250 ecr 3739788506,nop,wscale 7], length 0 > > > packet received (id=178 hw=0x0800 hook=1, payload len 52) # > Flags [.], ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 0 > > > GSO packet received (id=179 hw=0x0800 hook=1, payload len 2100, checksum not ready) # > Flags [P.], seq 1:2049, ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 2048 > > Stack built a larger packet, device or software fallback will segment > them as needed. > > > ================ Testing without GSO (needs v2 nfq6) > > > > nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 -t20 24 > > tcpdump cmd: (as above) > > > > > netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v > > > nfq6 output # tcpdump o/p (early fields and source port omitted) > > > packet received (id=1 hw=0x86dd hook=1, payload len 1496) # frag (0|1448) > 1042: UDP, length 2048 > > > Packet too short to get UDP payload # > > > # frag (1448|608) > > > ----------------------------------------------------------------------------- > > > netcat cmds: nc -4 -q0 -u dimstar 1042 <zxc2k : nc -4 -k -l -n -p 1042 -q0 -u -v > > > nfq6 output # tcpdump o/p (early fields omitted) > > > # UDP, length 2048 > > > packet received (id=3 hw=0x0800 hook=1, payload len 2076) # udp > > > ----------------------------------------------------------------------------- > > It would help if you could explain what is wrong here. This example shows a UDP4 2KB datagram being successfully mangled and a UDP6 2KB datagram failing to be mangled. > > You also removed tcpdump info, I suspect it was "flags [+]" > with two fragments for udp:ipv4 too? There are 2 fragments for both IPv4 and IPv6. tcpdump does not report any flags: > 08:16:09.713395 IP6 fe80::1a60:24ff:febb:2d6 > fe80::1ac0:4dff:fe04:75ba: frag (0|1448) 47843 > 1042: UDP, length 2048 > 08:16:09.713395 IP6 fe80::1a60:24ff:febb:2d6 > fe80::1ac0:4dff:fe04:75ba: frag (1448|608) > 08:17:22.924883 IP smallstar.local.net.55288 > dimstar.local.net.1042: UDP, length 2048 > 08:17:22.924883 IP smallstar.local.net > dimstar.local.net: udp > Frag handling depends on a lot of factors, such as ip defrag being > enabled or not, where queueing happens (hook and prio), if userspace > does mtu probing (like 'ping6 -M do') or not. > > And the NIC driver too. > > For incoming data it also depends on sysctl settings and if > GRO/LRO is enabled. > > > > packet received (id=49 hw=0x86dd hook=1, payload len 72) # > Flags [.], ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 0 > > > packet received (id=50 hw=0x86dd hook=1, payload len 1500) # > Flags [.], seq 1:1429, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 1428 > > > packet received (id=51 hw=0x86dd hook=3, payload len 72) # < Flags [.], ack 1429, win 501, options [nop,nop,TS val 2923572945 ecr 925571377], length 0 > > > packet received (id=52 hw=0x86dd hook=1, payload len 692) # > Flags [P.], seq 1429:2049, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 620 > > Kernel does software segmentation here, this is slow. Sure, that was just a test. --- Florian, please say if you would like more explanation. Thank you again for looking at this. Cheers ... Duncan.