On Thu, Mar 14, 2024 at 01:38:54PM +0100, Pablo Neira Ayuso wrote: > On Thu, Mar 14, 2024 at 12:30:30PM +0100, Sven Auhagen wrote: > > On Thu, Mar 14, 2024 at 12:17:51PM +0100, Pablo Neira Ayuso wrote: > > Hi Pablo, > > > > > Hi Sven, > > > > > > On Tue, Mar 12, 2024 at 05:29:45PM +0100, Sven Auhagen wrote: > > > > Hi, > > > > > > > > I have a race condition problem in the flowtable and could > > > > use some hint where to start debugging. > > > > > > > > Every now and then a TCP FIN is closing the flowtable with a call > > > > to flow_offload_teardown. > > > > > > > > Right after another packet from the reply direction is readding > > > > the connection to the flowtable just before the FIN is actually > > > > transitioning the state from ESTABLISHED to FIN WAIT. > > > > Now the FIN WAIT connection is OFFLOADED. > > > > > > Are you restricting your ruleset to only offload new connections? > > > > > > > It does not work to only use ct state new as we need to see both > > directions for the offload and the return packet is in ct state > > established at that point. > > Indeed, we need to see two packets at least, which will be the next > one coming in the opposite that is, the conntrack needs to be > confirmed. > > > > Or is it conntrack creating a fresh connection that being offloaded > > > for this terminating TCP traffic what you are observing? > > > > I can see a race condition where there is a TCP FIN packet > > so flow_offload_teardown is called but before the FIN packet > > is going through the slow path and sets the TCP connection to FIN_WAIT > > another packet is readding the state to the flowtable. > > > > So I end up with FIN_WAIT and status OFFLOADED. > > This only happens every few hunderd connections. > > > > > > This by itself should work itself out at gc time but > > > > the state is now deleted right away. > > > > > > > > Any idea why the state is deleted right away? > > > > > > It might be conntrack which is killing the connection, it would be > > > good to have a nf_ct_kill_reason(). Last time we talk, NAT can also > > > kill the conntrack in masquerade scenarios. > > > > > > > I found this out. > > The state is deleted in the end because the flow_offload_fixup_ct > > function is pulling the FIN_WAIT timeout and deducts the offload_timeout > > from it. This is 0 or very close to 0 and therefore ct gc is deleting the state > > more or less right away after the flow_offload_teardown is called > > (for the second time). > > This used to be set to: > > timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED]; > > but after: > > commit e5eaac2beb54f0a16ff851125082d9faeb475572 > Author: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > Date: Tue May 17 10:44:14 2022 +0200 > > netfilter: flowtable: fix TCP flow teardown > > it uses the current state. Yes since the TCP state might not be established anymore at that time. Your patch will work but also give my TCP_FIN a very large timeout. I have successfully tested this version today: - if (timeout < 0) - timeout = 0; + // Have at least some time left on the state + if (timeout < NF_FLOW_TIMEOUT) + timeout = NF_FLOW_TIMEOUT; This makes sure that the timeout is not so big like ESTABLISHED but still enough so the state does not time out right away. > > > > > Here is the output of the state messages: > > > > > > > > [NEW] tcp 6 120 SYN_SENT src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 [UNREPLIED] src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 mark=92274785 > > > > [UPDATE] tcp 6 60 SYN_RECV src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 mark=92274785 > > > > [UPDATE] tcp 6 432000 ESTABLISHED src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 [OFFLOAD] mark=92274785 > > > > [UPDATE] tcp 6 86400 FIN_WAIT src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 [OFFLOAD] mark=92274785 > > > > [DESTROY] tcp 6 FIN_WAIT src=192.168.97.23 dst=192.168.107.52 sport=63482 dport=443 packets=10 bytes=1415 src=192.168.107.52 dst=192.168.97.23 sport=443 dport=63482 packets=11 bytes=6343 [ASSURED] mark=92274785 delta-time=0 > > > > > > Is there a [NEW] event after this [DESTROY] in FIN_WAIT state to pick > > > the terminating connection from the middle? > > > > > > b6f27d322a0a ("netfilter: nf_flow_table: tear down TCP flows if RST or > > > FIN was seen") to let conntrack close the connection gracefully, > > > otherwise flowtable becomes stateless and already finished connections > > > remain in place which affects features such as connlimit. > > > > > > The intention in that patch is to remove the entry from the flowtable > > > then hand over back the conntrack to the connection tracking system > > > following slow path. > > > > So if the machanism is intended as it is then we need to make sure that the > > timeout is not so close to 0 and we life with the possible race condition? > > Then, this needs a state fix up based on the packet from the flowtable > path to infer the current state. > > This patch is not complete, it just restores ct timeout based on > the established state, which has also its own problems.