Re: Flowtable race condition error

Sven Auhagen <sven.auhagen@xxxxxxxxxxxx> · Thu, 14 Mar 2024 13:43:52 +0100

On Thu, Mar 14, 2024 at 01:38:54PM +0100, Pablo Neira Ayuso wrote:
> On Thu, Mar 14, 2024 at 12:30:30PM +0100, Sven Auhagen wrote:
> > On Thu, Mar 14, 2024 at 12:17:51PM +0100, Pablo Neira Ayuso wrote:
> > Hi Pablo,
> > 
> > > Hi Sven,
> > > 
> > > On Tue, Mar 12, 2024 at 05:29:45PM +0100, Sven Auhagen wrote:
> > > > Hi,
> > > > 
> > > > I have a race condition problem in the flowtable and could
> > > > use some hint where to start debugging.
> > > > 
> > > > Every now and then a TCP FIN is closing the flowtable with a call
> > > > to flow_offload_teardown.
> > > > 
> > > > Right after another packet from the reply direction is readding
> > > > the connection to the flowtable just before the FIN is actually
> > > > transitioning the state from ESTABLISHED to FIN WAIT.
> > > > Now the FIN WAIT connection is OFFLOADED.
> > > 
> > > Are you restricting your ruleset to only offload new connections?
> > > 
> > 
> > It does not work to only use ct state new as we need to see both
> > directions for the offload and the return packet is in ct state
> > established at that point.
> 
> Indeed, we need to see two packets at least, which will be the next
> one coming in the opposite that is, the conntrack needs to be
> confirmed.
> 
> > > Or is it conntrack creating a fresh connection that being offloaded
> > > for this terminating TCP traffic what you are observing?
> > 
> > I can see a race condition where there is a TCP FIN packet
> > so flow_offload_teardown is called but before the FIN packet
> > is going through the slow path and sets the TCP connection to FIN_WAIT
> > another packet is readding the state to the flowtable.
> >
> > So I end up with FIN_WAIT and status OFFLOADED.
> > This only happens every few hunderd connections.
> >
> > > > This by itself should work itself out at gc time but
> > > > the state is now deleted right away.
> > > >
> > > > Any idea why the state is deleted right away?
> > > 
> > > It might be conntrack which is killing the connection, it would be
> > > good to have a nf_ct_kill_reason(). Last time we talk, NAT can also
> > > kill the conntrack in masquerade scenarios.
> > > 
> > 
> > I found this out.
> > The state is deleted in the end because the flow_offload_fixup_ct
> > function is pulling the FIN_WAIT timeout and deducts the offload_timeout
> > from it. This is 0 or very close to 0 and therefore ct gc is deleting the state
> > more or less right away after the flow_offload_teardown is called
> > (for the second time).
> 
> This used to be set to:
> 
>         timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> 
> but after:
> 
> commit e5eaac2beb54f0a16ff851125082d9faeb475572
> Author: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
> Date:   Tue May 17 10:44:14 2022 +0200
> 
>     netfilter: flowtable: fix TCP flow teardown
> 
> it uses the current state.

Yes since the TCP state might not be established anymore at that time.
Your patch will work but also give my TCP_FIN a very large timeout.

I have successfully tested this version today:

-	if (timeout < 0)
-		timeout = 0;
+	// Have at least some time left on the state
+	if (timeout < NF_FLOW_TIMEOUT)
+		timeout = NF_FLOW_TIMEOUT;

This makes sure that the timeout is not so big like ESTABLISHED but still enough
so the state does not time out right away.

> 
> > > > Here is the output of the state messages:
> > > > 
> > > >     [NEW] tcp      6 120 SYN_SENT src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 [UNREPLIED] src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 mark=92274785
> > > >  [UPDATE] tcp      6 60 SYN_RECV src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 mark=92274785
> > > >  [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 [OFFLOAD] mark=92274785
> > > >  [UPDATE] tcp      6 86400 FIN_WAIT src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 [OFFLOAD] mark=92274785
> > > > [DESTROY] tcp      6 FIN_WAIT src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 packets=10 bytes=1415 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 packets=11 bytes=6343 [ASSURED] mark=92274785 delta-time=0
> > > 
> > > Is there a [NEW] event after this [DESTROY] in FIN_WAIT state to pick
> > > the terminating connection from the middle?
> > > 
> > > b6f27d322a0a ("netfilter: nf_flow_table: tear down TCP flows if RST or
> > > FIN was seen") to let conntrack close the connection gracefully,
> > > otherwise flowtable becomes stateless and already finished connections
> > > remain in place which affects features such as connlimit.
> > > 
> > > The intention in that patch is to remove the entry from the flowtable
> > > then hand over back the conntrack to the connection tracking system
> > > following slow path.
> > 
> > So if the machanism is intended as it is then we need to make sure that the
> > timeout is not so close to 0 and we life with the possible race condition?
> 
> Then, this needs a state fix up based on the packet from the flowtable
> path to infer the current state.
> 
> This patch is not complete, it just restores ct timeout based on
> the established state, which has also its own problems.