Re: Bug in xdp synproxy kernel code

Vincent Li <vincent.mc.li@xxxxxxxxx> · Thu, 30 Nov 2023 20:51:19 -0800

On Fri, Nov 24, 2023 at 9:30 AM Minh Le Hoang
<minh.lehoang@xxxxxxxxxxxxx> wrote:
>
> Hi everyone,
> I am Minh, and currently I am testing the xdp synproxy code from the
> linux kernel source code. To be more specific, I use the file
> xdp_synproxy_kern.c under the directory
> linux-6.5/tools/testing/selftests/bpf/progs .
> I set up the environment testing like this:
>
>     server                      router                     filter
> +--------+                +------------+              +------------+
> |        |                    |              |eth2            |            |
> |        |           eth3 |               |                   |            |
> |        +----------------+            +--------------+            |
> |        |eth1              |               |                 |            |
> |        |                     |               |          eth1|            |
> +--------+                +------+-----+              +------------+
>                                  | eth1                   198.51.100.10/29
>  203.0.113.10/29       |
>                                  |
>                                  |
>                                  |eth1 192.0.2.11/29
>                            +-----+------+
>                            |                |
>                            |                |
>                            |                |
>                            |                |
>                            |                |
>                            |                |
>                            +------------+
>
>                                client
> Router1:
>   eth1: 192.0.2.9/29
>   eth2: 198.51.100.9/29
>   eth3: 203.0.113.9/29
>
> Address of
> client: 192.0.2.11/29 (eth1)
> server: 203.0.113.10/29 (eth1)
> filter: 198.51.100.10/29 (eth1)
>
> All of the virtual machines are Ubuntu 23.04 linux kernel 6.5. In this
> network, all of the packets coming from client to server will be
> routed to go through filter and vice versa. Here are the linux command
> to configure routing table in the router:
>
> # Create extra routing tables on router1 (used for policy-based routing)
> ## Route table with ID 1 and name "outside". This is for lookups on
> the "simulated Internet" side, where the client lives.
>
> echo 1  outside >> /etc/iproute2/rt_tables
>
> ## Route table with ID 2 and name "filter", unused but added to have a
> consistent numbering and naming scheme - it's the interface to the
> filter node or cluster.
>
> echo 2  filter >> /etc/iproute2/rt_tables
>
> ## Route table with ID 3 and name "inside". This is for lookups on the
> "inside" or protected side, where server lives.
>
> echo 3  inside >> /etc/iproute2/rt_tables
>
> # Create default routes in the routing tables on router1. These should
> have the filter node (or cluster) as a nexthop.
>
> ip route add default via 198.51.100.10 dev eth2 table inside
> ip route add default via 198.51.100.10 dev eth2 table outside
>
> For the filter node, here are the linux command to configure it:
> # The filter node(s) need routing entries for the "outside" net and
> the "inside" network via our router.
> # If we don't do this, it would send traffic to the management network.
>
> ip route add 192.0.2.0/29 via 198.51.100.1
> ip route add 203.0.113.0/29 via 198.51.100.1
>
> # And disable redirects
>
> sysctl -w net.ipv4.conf.eth1.send_redirects=0
>
> After that, I configure iptables in filter node to use the xdp synproxy code:
>
> mount -t bpf bpf /sys/fs/bpf
> sysctl -w net.ipv4.tcp_syncookies=2
> sysctl -w net.ipv4.tcp_timestamps=1
> sysctl -w net.netfilter.nf_conntrack_tcp_loose=0
> iptables -t raw -I PREROUTING -i eth1 -p tcp -m tcp --syn --dport 80
> -j CT --notrack
> iptables -t filter -A FORWARD \
>    -i eth1 -p tcp -m tcp --dport 80 -m state --state INVALID,UNTRACKED \
>    -j SYNPROXY --sack-perm --timestamp --wscale 7 --mss 1460
> iptables -t filter -A FORWARD \
>    -i eth1 -m state --state INVALID -j DROP
>
> and then load the xdp synproxy code:
> ./xdp_synproxy --iface eth1 --ports 80 --single --mss4 1460 --mss6
> 1440 --wscale 7 --ttl 64
>

I have been unable to get it working by attaching  xdp synproxy to
firewall/router without having target/protected destination IP on
firewall/router by adding rules in filter table INPUT chain, your idea
of adding  rules in filter FORWARD  chain solves my puzzle :)

> I use the curl command in the client to get the web page from the
> server for testing. It is strange for me that after the synproxy code
> completes the 3 way handshake tcp with the client, it sends the syn
> packet to the server but it drops the SYNACK packet from the server.
>
I guess maybe originally the synproxy code is not expected to handle
SYNACK from the backend server?

> My colleague Jeroen (jeroen.vaningenschenau@xxxxxxxxxxxxx) and I had
> found out that the BUG in this part of code in the function
> tcp_lookup(), it does not pass the SYNACK tcp packet from the server:
>
> unsigned long status = ct->status;
> bpf_ct_release(ct);
> if (status & IPS_CONFIRMED_BIT){
>  return XDP_PASS;
>  }
>
> The value of status after the iptables established the tcp connection
> with the client is 8. The value of status enum is defined in the file
> nf_conntrack_common.h in the directory include/uapi/linux/netfilter.
> Here is the part of enum definition:
>
> /* Bitset representing status of connection. */
> enum ip_conntrack_status {
> /* It's an expected connection: bit 0 set.  This bit never changed */
> IPS_EXPECTED_BIT = 0,
> IPS_EXPECTED = (1 << IPS_EXPECTED_BIT),
>
> /* We've seen packets both ways: bit 1 set.  Can be set, not unset. */
> IPS_SEEN_REPLY_BIT = 1,
> IPS_SEEN_REPLY = (1 << IPS_SEEN_REPLY_BIT),
>
> /* Conntrack should never be early-expired. */
> IPS_ASSURED_BIT = 2,
> IPS_ASSURED = (1 << IPS_ASSURED_BIT),
>
> /* Connection is confirmed: originating packet has left box */
> IPS_CONFIRMED_BIT = 3,
> IPS_CONFIRMED = (1 << IPS_CONFIRMED_BIT),
>
> Thus, both my colleague Jeroen and I believe that this is a bug in the
> xdp synproxy code because it is checking for the 3rd bit but the
> condition checks the 1st bit and 2nd bit. This cause function
> tcp_lookup() to return XDP_TX, and it returns back to the function
> syncookie_part1() and the packet is dropped by this condition in the
> function syncookie_part1():
>
> /* Packet is TCP and doesn't belong to an established connection. */
> if ((hdr->tcp->syn ^ hdr->tcp->ack) != 1){
> return XDP_DROP;
> }
>
> For the solution, we change the checking condition to check for the
> 3rd bit in the tcp_lookup() function:
>
> if (status & IPS_CONFIRMED){
> return XDP_PASS;
> }
>
> Now the xdp synproxy kernel code does not drop the SYNACK tcp packet
> from the server.

Thanks for the analysis, if this is right and confirmed by other
expert, I guess I should also fix it up in the xdp synproxy  code I
ported to bpf-samples repo
https://github.com/xdp-project/bpf-examples/tree/master/xdp-synproxy.

> Kind regard,
> Minh
>