Re: Netfilter queue is unable to mangle fragmented UDP6: bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Florian,

Thank you for your detailed reply. Responses below:

On Wed, Oct 25, 2023 at 02:32:32PM +0200, Florian Westphal wrote:
> Duncan Roe <duncan_roe@xxxxxxxxxxxxxxx> wrote:
> > My libnetfilter_queue application is unable to mangle UDP6 messages that have
> > been fragmented. The kernel only delivers the first fragment of such a message
> > to the application.
> >
> > Is this a permanent restriction or a bug?
>
> There is not enough information here to answer this question,
> see below.
>
> > messages. "Something else" in the kernel re-combines UDP4 fragments before they
> > are queued to my application, so they mangle OK.
>
> I'm not sure what you mean or what you expect to happen.

I expected the netfilter program to see the full UDP datagram as sent. With
UDP/IPv4 it does see the full datagram, but not with UDP/IPv6.
>
> > In summary:
> >  - GSO re-combines TCP fragments before tcpdump can see them.
>
> Do you mean "segments"?  Its the other way around, with GSO/TSO, stack
> builds large superpackes, one tcp header with lots of data.

Sorry for the confusion here. I only meant to say that there is no problem with
TCP.

IOW kernel delivers to filter program exactly what was in the buffer when the
remote application did a write(2) (for buffer size up to just under 64KB).

I don't know what GSO is, only that it's strongly recommended to use it.
>
> Such superpackets are split at the last possible moment;
> ideally by NIC/hardware.
>
> >  - Some other kernel code re-combines UDP4 fragments before netfilter queues
> >    them
> >  - Some other different kernel code re-combines UDP6 fragments for the user
> >    application but after netfilter queues them
> >  - It's been this way for a number of years
>
> GSO is just the software fallback of TSO, i.e. local stack passes
> large skb down to the driver which will do pseudo segmentation,
> this needs hardware that can handle scatterlist, which is true for
> almost all nics.
>
> There is some segmentation support for UDP to handle encapsulation
> (tunneling) use cases, where stack can pass large skb and then can
> have hardware or software fallback do the segmentation for us, i.e.
> split according to inner protocol and add the outer udp encapsulation
> to all packets.
>
> > ================ Testing with GSO
> >
> >  nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 24
> >  tcpdump cmd: tcpdump -i eth1 'ether host 18:60:24:bb:02:d6 && (tcp || udp) &&
> >                       ! port x11'
> >
> > > netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v
> > >               nfq6 output                                   # tcpdump o/p (early fields omitted)
> > > packet received (id=169 hw=0x86dd hook=1, payload len 1496) # frag (0|1448) 33020 > 1042: UDP, length 2048
> > > Packet too short to get UDP payload                         #
> > >                                                             # frag (1448|608)
>
> You are sending a large udp packet via ipv6, it doesn't fit the device mtu,
> fragmentation is needed.  This has nothing to do with GSO.

OK I was under the mis-apprehension GSO was a level 3 thing (working at IP
level).
>
> > > packet received (id=176 hw=0x0800 hook=1, payload len 60)                           # > Flags [S], seq 821055799, win 64240, options [mss 1460,sackOK,TS val 3739788506 ecr 0,nop,wscale 7], length 0
> > > packet received (id=177 hw=0x0800 hook=3, payload len 60, checksum not ready)       # < Flags [S.], seq 1085807033, ack 821055800, win 65160, options [mss 1460,sackOK,TS val 4164299250 ecr 3739788506,nop,wscale 7], length 0
> > > packet received (id=178 hw=0x0800 hook=1, payload len 52)                           # > Flags [.], ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 0
> > > GSO packet received (id=179 hw=0x0800 hook=1, payload len 2100, checksum not ready) # > Flags [P.], seq 1:2049, ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 2048
>
> Stack built a larger packet, device or software fallback will segment
> them as needed.
>
> > ================ Testing without GSO (needs v2 nfq6)
> >
> >  nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 -t20 24
> >  tcpdump cmd: (as above)
> >
> > > netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v
> > >               nfq6 output                                   # tcpdump o/p (early fields and source port omitted)
> > > packet received (id=1 hw=0x86dd hook=1, payload len 1496)   # frag (0|1448) > 1042: UDP, length 2048
> > > Packet too short to get UDP payload                         #
> > >                                                             # frag (1448|608)
> > > -----------------------------------------------------------------------------
> > > netcat cmds: nc -4 -q0 -u dimstar 1042 <zxc2k : nc -4 -k -l -n -p 1042 -q0 -u -v
> > >               nfq6 output                                   # tcpdump o/p (early fields omitted)
> > >                                                             # UDP, length 2048
> > > packet received (id=3 hw=0x0800 hook=1, payload len 2076)   # udp
> > > -----------------------------------------------------------------------------
>
> It would help if you could explain what is wrong here.

This example shows a UDP4 2KB datagram being successfully mangled and a UDP6 2KB
datagram failing to be mangled.
>
> You also removed tcpdump info, I suspect it was "flags [+]"
> with two fragments for udp:ipv4 too?

There are 2 fragments for both IPv4 and IPv6.

tcpdump does not report any flags:

> 08:16:09.713395 IP6 fe80::1a60:24ff:febb:2d6 > fe80::1ac0:4dff:fe04:75ba: frag (0|1448) 47843 > 1042: UDP, length 2048
> 08:16:09.713395 IP6 fe80::1a60:24ff:febb:2d6 > fe80::1ac0:4dff:fe04:75ba: frag (1448|608)
> 08:17:22.924883 IP smallstar.local.net.55288 > dimstar.local.net.1042: UDP, length 2048
> 08:17:22.924883 IP smallstar.local.net > dimstar.local.net: udp

> Frag handling depends on a lot of factors, such as ip defrag being
> enabled or not, where queueing happens (hook and prio), if userspace
> does mtu probing (like 'ping6 -M do') or not.
>
> And the NIC driver too.
>
> For incoming data it also depends on sysctl settings and if
> GRO/LRO is enabled.
>
> > > packet received (id=49 hw=0x86dd hook=1, payload len 72)    # > Flags [.], ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 0
> > > packet received (id=50 hw=0x86dd hook=1, payload len 1500)  # > Flags [.], seq 1:1429, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 1428
> > > packet received (id=51 hw=0x86dd hook=3, payload len 72)    # < Flags [.], ack 1429, win 501, options [nop,nop,TS val 2923572945 ecr 925571377], length 0
> > > packet received (id=52 hw=0x86dd hook=1, payload len 692)   # > Flags [P.], seq 1429:2049, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 620
>
> Kernel does software segmentation here, this is slow.

Sure, that was just a test.

---

Florian, please say if you would like more explanation. Thank you again for
looking at this.

Cheers ... Duncan.



[Index of Archives]     [Netfitler Users]     [Berkeley Packet Filter]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux