Re: Flowtable race condition error

Sven Auhagen <sven.auhagen@xxxxxxxxxxxx> · Thu, 14 Mar 2024 12:30:30 +0100

On Thu, Mar 14, 2024 at 12:17:51PM +0100, Pablo Neira Ayuso wrote:
Hi Pablo,

> Hi Sven,
> 
> On Tue, Mar 12, 2024 at 05:29:45PM +0100, Sven Auhagen wrote:
> > Hi,
> > 
> > I have a race condition problem in the flowtable and could
> > use some hint where to start debugging.
> > 
> > Every now and then a TCP FIN is closing the flowtable with a call
> > to flow_offload_teardown.
> > 
> > Right after another packet from the reply direction is readding
> > the connection to the flowtable just before the FIN is actually
> > transitioning the state from ESTABLISHED to FIN WAIT.
> > Now the FIN WAIT connection is OFFLOADED.
> 
> Are you restricting your ruleset to only offload new connections?
> 

It does not work to only use ct state new as we need to see both
directions for the offload and the return packet is in ct state
established at that point.

> Or is it conntrack creating a fresh connection that being offloaded
> for this terminating TCP traffic what you are observing?
> 

I can see a race condition where there is a TCP FIN packet
so flow_offload_teardown is called but before the FIN packet
is going through the slow path and sets the TCP connection to FIN_WAIT
another packet is readding the state to the flowtable.
So I end up with FIN_WAIT and status OFFLOADED.
This only happens every few hunderd connections.

> > This by itself should work itself out at gc time but
> > the state is now deleted right away.
> >
> > Any idea why the state is deleted right away?
> 
> It might be conntrack which is killing the connection, it would be
> good to have a nf_ct_kill_reason(). Last time we talk, NAT can also
> kill the conntrack in masquerade scenarios.
> 

I found this out.
The state is deleted in the end because the flow_offload_fixup_ct
function is pulling the FIN_WAIT timeout and deducts the offload_timeout
from it. This is 0 or very close to 0 and therefore ct gc is deleting the state
more or less right away after the flow_offload_teardown is called
(for the second time).

> > Here is the output of the state messages:
> > 
> >     [NEW] tcp      6 120 SYN_SENT src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 [UNREPLIED] src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 mark=92274785
> >  [UPDATE] tcp      6 60 SYN_RECV src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 mark=92274785
> >  [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 [OFFLOAD] mark=92274785
> >  [UPDATE] tcp      6 86400 FIN_WAIT src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 [OFFLOAD] mark=92274785
> > [DESTROY] tcp      6 FIN_WAIT src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 packets=10 bytes=1415 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 packets=11 bytes=6343 [ASSURED] mark=92274785 delta-time=0
> 
> Is there a [NEW] event after this [DESTROY] in FIN_WAIT state to pick
> the terminating connection from the middle?
> 
> b6f27d322a0a ("netfilter: nf_flow_table: tear down TCP flows if RST or
> FIN was seen") to let conntrack close the connection gracefully,
> otherwise flowtable becomes stateless and already finished connections
> remain in place which affects features such as connlimit.
> 
> The intention in that patch is to remove the entry from the flowtable
> then hand over back the conntrack to the connection tracking system
> following slow path.

So if the machanism is intended as it is then we need to make sure that the
timeout is not so close to 0 and we life with the possible race condition?