On Fri, Nov 24, 2023 at 9:30 AM Minh Le Hoang <minh.lehoang@xxxxxxxxxxxxx> wrote: > > Hi everyone, > I am Minh, and currently I am testing the xdp synproxy code from the > linux kernel source code. To be more specific, I use the file > xdp_synproxy_kern.c under the directory > linux-6.5/tools/testing/selftests/bpf/progs . > I set up the environment testing like this: > > server router filter > +--------+ +------------+ +------------+ > | | | |eth2 | | > | | eth3 | | | | > | +----------------+ +--------------+ | > | |eth1 | | | | > | | | | eth1| | > +--------+ +------+-----+ +------------+ > | eth1 198.51.100.10/29 > 203.0.113.10/29 | > | > | > |eth1 192.0.2.11/29 > +-----+------+ > | | > | | > | | > | | > | | > | | > +------------+ > > client > Router1: > eth1: 192.0.2.9/29 > eth2: 198.51.100.9/29 > eth3: 203.0.113.9/29 > > Address of > client: 192.0.2.11/29 (eth1) > server: 203.0.113.10/29 (eth1) > filter: 198.51.100.10/29 (eth1) > > All of the virtual machines are Ubuntu 23.04 linux kernel 6.5. In this > network, all of the packets coming from client to server will be > routed to go through filter and vice versa. Here are the linux command > to configure routing table in the router: > > # Create extra routing tables on router1 (used for policy-based routing) > ## Route table with ID 1 and name "outside". This is for lookups on > the "simulated Internet" side, where the client lives. > > echo 1 outside >> /etc/iproute2/rt_tables > > ## Route table with ID 2 and name "filter", unused but added to have a > consistent numbering and naming scheme - it's the interface to the > filter node or cluster. > > echo 2 filter >> /etc/iproute2/rt_tables > > ## Route table with ID 3 and name "inside". This is for lookups on the > "inside" or protected side, where server lives. > > echo 3 inside >> /etc/iproute2/rt_tables > > # Create default routes in the routing tables on router1. These should > have the filter node (or cluster) as a nexthop. > > ip route add default via 198.51.100.10 dev eth2 table inside > ip route add default via 198.51.100.10 dev eth2 table outside > > For the filter node, here are the linux command to configure it: > # The filter node(s) need routing entries for the "outside" net and > the "inside" network via our router. > # If we don't do this, it would send traffic to the management network. > > ip route add 192.0.2.0/29 via 198.51.100.1 > ip route add 203.0.113.0/29 via 198.51.100.1 > > # And disable redirects > > sysctl -w net.ipv4.conf.eth1.send_redirects=0 > > After that, I configure iptables in filter node to use the xdp synproxy code: > > mount -t bpf bpf /sys/fs/bpf > sysctl -w net.ipv4.tcp_syncookies=2 > sysctl -w net.ipv4.tcp_timestamps=1 > sysctl -w net.netfilter.nf_conntrack_tcp_loose=0 > iptables -t raw -I PREROUTING -i eth1 -p tcp -m tcp --syn --dport 80 > -j CT --notrack > iptables -t filter -A FORWARD \ > -i eth1 -p tcp -m tcp --dport 80 -m state --state INVALID,UNTRACKED \ > -j SYNPROXY --sack-perm --timestamp --wscale 7 --mss 1460 > iptables -t filter -A FORWARD \ > -i eth1 -m state --state INVALID -j DROP > > and then load the xdp synproxy code: > ./xdp_synproxy --iface eth1 --ports 80 --single --mss4 1460 --mss6 > 1440 --wscale 7 --ttl 64 > I have been unable to get it working by attaching xdp synproxy to firewall/router without having target/protected destination IP on firewall/router by adding rules in filter table INPUT chain, your idea of adding rules in filter FORWARD chain solves my puzzle :) > I use the curl command in the client to get the web page from the > server for testing. It is strange for me that after the synproxy code > completes the 3 way handshake tcp with the client, it sends the syn > packet to the server but it drops the SYNACK packet from the server. > I guess maybe originally the synproxy code is not expected to handle SYNACK from the backend server? > My colleague Jeroen (jeroen.vaningenschenau@xxxxxxxxxxxxx) and I had > found out that the BUG in this part of code in the function > tcp_lookup(), it does not pass the SYNACK tcp packet from the server: > > unsigned long status = ct->status; > bpf_ct_release(ct); > if (status & IPS_CONFIRMED_BIT){ > return XDP_PASS; > } > > The value of status after the iptables established the tcp connection > with the client is 8. The value of status enum is defined in the file > nf_conntrack_common.h in the directory include/uapi/linux/netfilter. > Here is the part of enum definition: > > /* Bitset representing status of connection. */ > enum ip_conntrack_status { > /* It's an expected connection: bit 0 set. This bit never changed */ > IPS_EXPECTED_BIT = 0, > IPS_EXPECTED = (1 << IPS_EXPECTED_BIT), > > /* We've seen packets both ways: bit 1 set. Can be set, not unset. */ > IPS_SEEN_REPLY_BIT = 1, > IPS_SEEN_REPLY = (1 << IPS_SEEN_REPLY_BIT), > > /* Conntrack should never be early-expired. */ > IPS_ASSURED_BIT = 2, > IPS_ASSURED = (1 << IPS_ASSURED_BIT), > > /* Connection is confirmed: originating packet has left box */ > IPS_CONFIRMED_BIT = 3, > IPS_CONFIRMED = (1 << IPS_CONFIRMED_BIT), > > Thus, both my colleague Jeroen and I believe that this is a bug in the > xdp synproxy code because it is checking for the 3rd bit but the > condition checks the 1st bit and 2nd bit. This cause function > tcp_lookup() to return XDP_TX, and it returns back to the function > syncookie_part1() and the packet is dropped by this condition in the > function syncookie_part1(): > > /* Packet is TCP and doesn't belong to an established connection. */ > if ((hdr->tcp->syn ^ hdr->tcp->ack) != 1){ > return XDP_DROP; > } > > For the solution, we change the checking condition to check for the > 3rd bit in the tcp_lookup() function: > > if (status & IPS_CONFIRMED){ > return XDP_PASS; > } > > Now the xdp synproxy kernel code does not drop the SYNACK tcp packet > from the server. Thanks for the analysis, if this is right and confirmed by other expert, I guess I should also fix it up in the xdp synproxy code I ported to bpf-samples repo https://github.com/xdp-project/bpf-examples/tree/master/xdp-synproxy. > Kind regard, > Minh >