Re: [PATCH nf] netfilter: flowtable: infer TCP state and timeout before flow teardown

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 09, 2024 at 01:35:15PM +0200, Sven Auhagen wrote:
> On Tue, Apr 09, 2024 at 01:11:43PM +0200, Pablo Neira Ayuso wrote:
> > Hi Sven,
> > 
> > On Mon, Apr 08, 2024 at 07:24:43AM +0200, Sven Auhagen wrote:
> > > Hi Pablo,
> > > 
> > > after some testing the problem only happens very rarely now.
> > > I suspect it happens only on connections that are at some point
> > > one way only or in some other way not in a correct state anymore.
> > > Never the less your latest patches are very good and reduce the problem
> > > to an absolute minimum that FIN WAIT is offlodaded and the timeout
> > > is correct now.
> > 
> > Thanks for testing, I am going to submit this patch.
> > 
> > If you have a bit more cycles, I still would like to know what corner
> > case scenario is still triggering this so...
> > 
> > > Here is one example if a flow that still is in FIN WAIT:
> > > 
> > > [NEW] tcp      6 120 SYN_SENT src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 [UNREPLIED] src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 mark=16777216
> > > [UPDATE] tcp      6 60 SYN_RECV src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 mark=16777216
> > > [UPDATE] tcp      6 86400 ESTABLISHED src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 [OFFLOAD] mark=16777216
> > > [UPDATE] tcp      6 120 FIN_WAIT src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 [OFFLOAD] mark=16777216
> > > [UPDATE] tcp      6 30 LAST_ACK src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 [ASSURED] mark=16777216
> > >  [UPDATE] tcp      6 120 TIME_WAIT src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 [ASSURED] mark=16777216
> > >  [DESTROY] tcp      6 TIME_WAIT src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 packets=15 bytes=1750 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 packets=13 bytes=6905 [ASSURED] mark=16777216 delta-time=120
> > 
> > ... could you run conntrack -E -o timestamp? I'd like to know if this is
> > a flow that is handed over back to classic path after 30 seconds, then
> > being placed in the flowtable again.
> 
> Sure here is a fresh output:
> 
> [1712662404.573225]	    [NEW] tcp      6 120 SYN_SENT src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 [UNREPLIED] src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 mark=25165825
> [1712662404.588094]	 [UPDATE] tcp      6 60 SYN_RECV src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 mark=25165825
> [1712662404.591802]	 [UPDATE] tcp      6 86400 ESTABLISHED src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 [OFFLOAD] mark=25165825
> [1712662405.682563]	 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 [OFFLOAD] mark=25165825

It is happening right away, a bit more of 1 second after.

I can also see IP_CT_TCP_FLAG_CLOSE_INIT is not set on when ct->state
is adjusted to _FIN_WAIT state in the fixup routine.

There are also packets that might trigger transitions to sIG
(ignored) in TCP tracking. Probably conntrack -E is misleading because
this does not allow us to see what actual packets are coming after the
fin.

Can you give a try to this untested patch? It applies on top of the
two patches you already have for TCP and UDP.

> [1712662405.689501]	 [UPDATE] tcp      6 30 LAST_ACK src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 [ASSURED] mark=25165825
> [1712662405.704370]	 [UPDATE] tcp      6 120 TIME_WAIT src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 [ASSURED] mark=25165825

Can you also enable -o id,timestamp? The id should tell us if we are
always talking on the same object.

> [1712662451.967906]	[DESTROY] tcp      6 ESTABLISHED src=192.168.6.122 dst=52.98.243.2 sport=52717 dport=443 packets=14 bytes=4134 src=52.98.243.2 dst=37.24.174.42 sport=443 dport=20116 packets=17 bytes=13712 [ASSURED] mark=16777216 delta-time=140
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 8507deed6b1f..2388c4fe99a0 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -293,7 +293,7 @@ int nf_flow_table_init(struct nf_flowtable *flow_table);
 void nf_flow_table_free(struct nf_flowtable *flow_table);
 
 void flow_offload_teardown(struct flow_offload *flow);
-void flow_offload_teardown_tcp(struct flow_offload *flow, bool fin);
+void flow_offload_teardown_tcp(struct flow_offload *flow, bool fin, int dir);
 
 void nf_flow_snat_port(const struct flow_offload *flow,
 		       struct sk_buff *skb, unsigned int thoff,
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 16068ef04490..e3a7d8eff727 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -350,18 +350,21 @@ void flow_offload_teardown(struct flow_offload *flow)
 }
 EXPORT_SYMBOL_GPL(flow_offload_teardown);
 
-void flow_offload_teardown_tcp(struct flow_offload *flow, bool fin)
+void flow_offload_teardown_tcp(struct flow_offload *flow, bool fin, int dir)
 {
+	struct nf_conn *ct = flow->ct;
 	enum tcp_conntrack tcp_state;
 
-	if (fin)
+	if (fin) {
 		tcp_state = TCP_CONNTRACK_FIN_WAIT;
-	else /* rst */
+		ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT;
+	} else { /* rst */
 		tcp_state = TCP_CONNTRACK_CLOSE;
+	}
 
-	flow_offload_fixup_tcp(nf_ct_net(flow->ct), flow->ct, tcp_state);
+	flow_offload_fixup_tcp(nf_ct_net(ct), ct, tcp_state);
 	smp_mb__before_atomic();
-	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
+	clear_bit(IPS_OFFLOAD_BIT, &ct->status);
 	set_bit(NF_FLOW_TEARDOWN, &flow->flags);
 }
 EXPORT_SYMBOL_GPL(flow_offload_teardown_tcp);
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 9840ab5e3ae6..04fabfa9ff7e 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -20,7 +20,7 @@
 #include <linux/udp.h>
 
 static int nf_flow_state_check(struct flow_offload *flow, int proto,
-			       struct sk_buff *skb, unsigned int thoff)
+			       struct sk_buff *skb, unsigned int thoff, int dir)
 {
 	struct tcphdr *tcph;
 
@@ -29,7 +29,7 @@ static int nf_flow_state_check(struct flow_offload *flow, int proto,
 
 	tcph = (void *)(skb_network_header(skb) + thoff);
 	if (unlikely(tcph->fin || tcph->rst)) {
-		flow_offload_teardown_tcp(flow, tcph->fin);
+		flow_offload_teardown_tcp(flow, tcph->fin, dir);
 		return -1;
 	}
 
@@ -379,7 +379,7 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
 
 	iph = (struct iphdr *)(skb_network_header(skb) + ctx->offset);
 	thoff = (iph->ihl * 4) + ctx->offset;
-	if (nf_flow_state_check(flow, iph->protocol, skb, thoff))
+	if (nf_flow_state_check(flow, iph->protocol, skb, thoff, dir))
 		return 0;
 
 	if (!nf_flow_dst_check(&tuplehash->tuple)) {
@@ -658,7 +658,7 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
 
 	ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset);
 	thoff = sizeof(*ip6h) + ctx->offset;
-	if (nf_flow_state_check(flow, ip6h->nexthdr, skb, thoff))
+	if (nf_flow_state_check(flow, ip6h->nexthdr, skb, thoff, dir))
 		return 0;
 
 	if (!nf_flow_dst_check(&tuplehash->tuple)) {

[Index of Archives]     [Netfitler Users]     [Berkeley Packet Filter]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux