Re: [PATCH nf] netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 1, 2023 at 11:07 AM Xin Long <lucien.xin@xxxxxxxxx> wrote:
>
> In Scenario A and B below, as the delayed INIT_ACK always changes the peer
> vtag, SCTP ct with the incorrect vtag may cause packet loss.
>
> Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK
>
>   192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772]
>     192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151]
>     192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772]
>   192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] *
>   192.168.1.2 > 192.168.1.1: [COOKIE ECHO]
>     192.168.1.1 > 192.168.1.2: [COOKIE ECHO]
>     192.168.1.2 > 192.168.1.1: [COOKIE ACK]
>
> Scenario B: INIT_ACK is delayed until the peer completes its own handshake
>
>   192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
>     192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
>     192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408]
>     192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO]
>     192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK]
>   192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] *
>
> This patch fixes it as below:
>
> In SCTP_CID_INIT processing:
> - clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] &&
>   ct->proto.sctp.init[!dir]. (Scenario E)
> - set ct->proto.sctp.init[dir].
>
> In SCTP_CID_INIT_ACK processing:
> - drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] &&
>   ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C)
> - drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] &&
>   ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A)
>
> In SCTP_CID_COOKIE_ACK processing:
> - clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir]. (Scenario D)
>
> Also, it's important to allow the ct state to move forward with cookie_echo
> and cookie_ack from the opposite dir for the collision scenarios.
>
> There are also other Scenarios where it should allow the packet through,
> addressed by the processing above:
>
> Scenario C: new CT is created by INIT_ACK.
>
> Scenario D: start INIT on the existing ESTABLISHED ct.
>
> Scenario E: start INIT after the old collision on the existing ESTABLISHED ct.
>
>   192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
>   192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
>   (both side are stopped, then start new connection again in hours)
>   192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742]
>
> Signed-off-by: Xin Long <lucien.xin@xxxxxxxxx>
> ---
>  include/linux/netfilter/nf_conntrack_sctp.h |  1 +
>  net/netfilter/nf_conntrack_proto_sctp.c     | 41 ++++++++++++++++-----
>  2 files changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/netfilter/nf_conntrack_sctp.h b/include/linux/netfilter/nf_conntrack_sctp.h
> index 625f491b95de..fb31312825ae 100644
> --- a/include/linux/netfilter/nf_conntrack_sctp.h
> +++ b/include/linux/netfilter/nf_conntrack_sctp.h
> @@ -9,6 +9,7 @@ struct ip_ct_sctp {
>         enum sctp_conntrack state;
>
>         __be32 vtag[IP_CT_DIR_MAX];
> +       u8 init[IP_CT_DIR_MAX];
>         u8 last_dir;
>         u8 flags;
>  };
> diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
> index b6bcc8f2f46b..91aee286d503 100644
> --- a/net/netfilter/nf_conntrack_proto_sctp.c
> +++ b/net/netfilter/nf_conntrack_proto_sctp.c
> @@ -112,7 +112,7 @@ static const u8 sctp_conntracks[2][11][SCTP_CONNTRACK_MAX] = {
>  /* shutdown_ack */ {sSA, sCL, sCW, sCE, sES, sSA, sSA, sSA, sSA},
>  /* error        */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCL},/* Can't have Stale cookie*/
>  /* cookie_echo  */ {sCL, sCL, sCE, sCE, sES, sSS, sSR, sSA, sCL},/* 5.2.4 - Big TODO */
> -/* cookie_ack   */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCL},/* Can't come in orig dir */
> +/* cookie_ack   */ {sCL, sCL, sCW, sES, sES, sSS, sSR, sSA, sCL},/* Can't come in orig dir */
>  /* shutdown_comp*/ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sCL, sCL},
>  /* heartbeat    */ {sHS, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS},
>  /* heartbeat_ack*/ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS},
> @@ -126,7 +126,7 @@ static const u8 sctp_conntracks[2][11][SCTP_CONNTRACK_MAX] = {
>  /* shutdown     */ {sIV, sCL, sCW, sCE, sSR, sSS, sSR, sSA, sIV},
>  /* shutdown_ack */ {sIV, sCL, sCW, sCE, sES, sSA, sSA, sSA, sIV},
>  /* error        */ {sIV, sCL, sCW, sCL, sES, sSS, sSR, sSA, sIV},
> -/* cookie_echo  */ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sIV},/* Can't come in reply dir */
> +/* cookie_echo  */ {sIV, sCL, sCE, sCE, sES, sSS, sSR, sSA, sIV},/* Can't come in reply dir */
>  /* cookie_ack   */ {sIV, sCL, sCW, sES, sES, sSS, sSR, sSA, sIV},
>  /* shutdown_comp*/ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sCL, sIV},
>  /* heartbeat    */ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS},
> @@ -412,6 +412,9 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct,
>                         /* (D) vtag must be same as init_vtag as found in INIT_ACK */
>                         if (sh->vtag != ct->proto.sctp.vtag[dir])
>                                 goto out_unlock;
> +               } else if (sch->type == SCTP_CID_COOKIE_ACK) {
> +                       ct->proto.sctp.init[dir] = 0;
> +                       ct->proto.sctp.init[!dir] = 0;
>                 } else if (sch->type == SCTP_CID_HEARTBEAT) {
>                         if (ct->proto.sctp.vtag[dir] == 0) {
>                                 pr_debug("Setting %d vtag %x for dir %d\n", sch->type, sh->vtag, dir);
> @@ -461,16 +464,18 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct,
>                 }
>
>                 /* If it is an INIT or an INIT ACK note down the vtag */
> -               if (sch->type == SCTP_CID_INIT ||
> -                   sch->type == SCTP_CID_INIT_ACK) {
> -                       struct sctp_inithdr _inithdr, *ih;
> +               if (sch->type == SCTP_CID_INIT) {
> +                       struct sctp_inithdr _ih, *ih;
>
> -                       ih = skb_header_pointer(skb, offset + sizeof(_sch),
> -                                               sizeof(_inithdr), &_inithdr);
> +                       ih = skb_header_pointer(skb, offset + sizeof(_sch), sizeof(*ih), &_ih);
>                         if (ih == NULL)
>                                 goto out_unlock;
> -                       pr_debug("Setting vtag %x for dir %d\n",
> -                                ih->init_tag, !dir);
> +
> +                       if (ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir])
> +                               ct->proto.sctp.init[!dir] = 0;
> +                       ct->proto.sctp.init[dir] = 1;
> +
> +                       pr_debug("Setting vtag %x for dir %d\n", ih->init_tag, !dir);
>                         ct->proto.sctp.vtag[!dir] = ih->init_tag;
>
>                         /* don't renew timeout on init retransmit so
> @@ -481,6 +486,24 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct,
>                             old_state == SCTP_CONNTRACK_CLOSED &&
>                             nf_ct_is_confirmed(ct))
>                                 ignore = true;
> +               } else if (sch->type == SCTP_CID_INIT_ACK) {
> +                       struct sctp_inithdr _ih, *ih;
> +                       u32 vtag;
> +
> +                       ih = skb_header_pointer(skb, offset + sizeof(_sch), sizeof(*ih), &_ih);
> +                       if (ih == NULL)
> +                               goto out_unlock;
> +
> +                       vtag = ct->proto.sctp.vtag[!dir];
> +                       if (!ct->proto.sctp.init[!dir] && vtag && vtag != ih->init_tag)
> +                               goto out_unlock;
> +                       /* collision */
> +                       if (ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] &&
> +                           vtag != ih->init_tag)
> +                               goto out_unlock;
> +
> +                       pr_debug("Setting vtag %x for dir %d\n", ih->init_tag, !dir);
> +                       ct->proto.sctp.vtag[!dir] = ih->init_tag;
>                 }
>
>                 ct->proto.sctp.state = new_state;
> --
> 2.39.1
>
a reproducer is attached.

Thanks.
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# Testing For SCTP COLLISION SCENARIO as Below:
#
#   14:35:47.655279 IP 198.51.200.1.1234 > 198.51.100.1.1234: sctp (1) [INIT] [init tag: 2017837359]
#   14:35:48.353250 IP 198.51.100.1.1234 > 198.51.200.1.1234: sctp (1) [INIT] [init tag: 1187206187]
#   14:35:48.353275 IP 198.51.200.1.1234 > 198.51.100.1.1234: sctp (1) [INIT ACK] [init tag: 2017837359]
#   14:35:48.353283 IP 198.51.100.1.1234 > 198.51.200.1.1234: sctp (1) [COOKIE ECHO]
#   14:35:48.353977 IP 198.51.200.1.1234 > 198.51.100.1.1234: sctp (1) [COOKIE ACK]
#   14:35:48.855335 IP 198.51.100.1.1234 > 198.51.200.1.1234: sctp (1) [INIT ACK] [init tag: 164579970]
#
# TOPO: oamcm (link0)<--->(link1) HOST/ROUTER (link2)<--->(link3) oambb

# include the c test file in this script
cat > ./sctp_test.c << EOF
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>

int main(int argc, char *argv[])
{
	struct sockaddr_in saddr = {}, daddr = {};
	int sd, ret, len = sizeof(daddr);
	struct timeval tv = {25, 0};
	char buf[] = "hello";

	if (argc != 6 || (strcmp(argv[1], "server") && strcmp(argv[1], "client"))) {
		printf("%s <server|client> <LOCAL_IP> <LOCAL_PORT> <REMOTE_IP> <REMOTE_PORT>\n", argv[0]);
		return -1;
	}

	sd = socket(AF_INET, SOCK_SEQPACKET, IPPROTO_SCTP);
	if (sd < 0) {
		printf("Failed to create sd\n");
		return -1;
	}

	saddr.sin_family = AF_INET;
	saddr.sin_addr.s_addr = inet_addr(argv[2]);
	saddr.sin_port = htons(atoi(argv[3]));

	ret = bind(sd, (struct sockaddr *)&saddr, sizeof(saddr));
	if (ret < 0) {
		printf("Failed to bind to address\n");
		return -1;
	}

	ret = listen(sd, 5);
	if (ret < 0) {
		printf("Failed to listen on port\n");
		return -1;
	}

	daddr.sin_family = AF_INET;
	daddr.sin_addr.s_addr = inet_addr(argv[4]);
	daddr.sin_port = htons(atoi(argv[5]));

	/* make test shorter than 25s */
	ret = setsockopt(sd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
	if (ret < 0) {
		printf("Failed to setsockopt SO_RCVTIMEO\n");
		return -1;
	}

	if (!strcmp(argv[1], "server")) {
		sleep(1); /* wait a bit for client's INIT */
		ret = connect(sd, (struct sockaddr *)&daddr, len);
		if (ret < 0) {
			printf("Failed to connect to peer\n");
			return -1;
		}
		ret = recvfrom(sd, buf, sizeof(buf), 0, (struct sockaddr *)&daddr, &len);
		if (ret < 0) {
			printf("Failed to recv msg %d\n", ret);
			return -1;
		}
		ret = sendto(sd, buf, strlen(buf) + 1, 0, (struct sockaddr *)&daddr, len);
		if (ret < 0) {
			printf("Failed to send msg %d\n", ret);
			return -1;
		}
		printf("Server: sent! %d\n", ret);
	}

	if (!strcmp(argv[1], "client")) {
		usleep(300000); /* wait a bit for server's listening */
		ret = connect(sd, (struct sockaddr *)&daddr, len);
		if (ret < 0) {
			printf("Failed to connect to peer\n");
			return -1;
		}
		sleep(1); /* wait a bit for server's delayed INIT_ACK to reproduce the issue */
		ret = sendto(sd, buf, strlen(buf) + 1, 0, (struct sockaddr *)&daddr, len);
		if (ret < 0) {
			printf("Failed to send msg %d\n", ret);
			return -1;
		}
		ret = recvfrom(sd, buf, sizeof(buf), 0, (struct sockaddr *)&daddr, &len);
		if (ret < 0) {
			printf("Failed to recv msg %d\n", ret);
			return -1;
		}
		printf("Client: rcvd! %d\n", ret);
	}

	close(sd);
	return 0;
}
EOF

if ! conntrack -V; then
	echo "SKIP: Could not run test without conntrack-tools, try # dnf install -y conntrack-tools"
	exit 4
fi
if ! make sctp_test; then
	echo "SKIP: Failed to compile sctp_test.c"
	exit 4
fi

# clean up
ip netns del oamcm > /dev/null 2>&1
ip netns del oambb > /dev/null 2>&1
ip link del link1 > /dev/null 2>&1
ip link del link2 > /dev/null 2>&1
conntrack -D -p sctp > /dev/null 2>&1
iptables -F

# setup the topo
ip netns add oamcm
ip netns add oambb
ip link add link1 type veth peer name link0 netns oamcm
ip link add link2 type veth peer name link3 netns oambb

ip -n oamcm link set link0 up
ip -n oamcm addr add 198.51.100.1/24 dev link0
ip -n oamcm route add 198.51.200.1 dev link0 via 198.51.100.2

ip link set link1 up
ip link set link2 up
ip addr add 198.51.100.2/24 dev link1
ip addr add 198.51.200.2/24 dev link2
sysctl -wq net.ipv4.ip_forward=1

ip -n oambb link set link3 up
ip -n oambb addr add 198.51.200.1/24 dev link3
ip -n oambb route add 198.51.100.1 dev link3 via 198.51.200.2

# simulate the delay on OVS upcall by setting up a delay for INIT_ACK with tc on oamcm side
tc -n oamcm qdisc add dev link0 root handle 1: htb
tc -n oamcm class add dev link0 parent 1: classid 1:1 htb rate 100mbit
tc -n oamcm filter add dev link0 parent 1: protocol ip u32 match ip protocol 132 0xff match u8 2 0xff at 32 flowid 1:1
tc -n oamcm qdisc add dev link0 parent 1:1 handle 10: netem delay 1200ms

# simulate the ctstate check on OVS nf_conntrack
iptables -A FORWARD -m state --state INVALID,UNTRACKED -j DROP
iptables -A INPUT -p sctp -j DROP

# use a smaller number for assoc's max_retrans to reproduce the issue
modprobe sctp
ip netns exec oambb sysctl -wq net.sctp.association_max_retrans=3

# NOTE: one way to work around the issue is set a smaller hb_interval
# ip netns exec oambb sysctl -wq net.sctp.hb_interval=3500

# run the test case
echo "Test Started (wait up to 25s):"
ip net exec oamcm ./sctp_test server 198.51.100.1 1234 198.51.200.1 1234 &
ip net exec oambb ./sctp_test client 198.51.200.1 1234 198.51.100.1 1234 && echo "PASS!" && exit 0
echo "FAIL!" && exit 1

[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     SCTP

  Powered by Linux