Re: Flowtable race condition error

Sven Auhagen <sven.auhagen@xxxxxxxxxxxx> · Thu, 14 Mar 2024 14:56:09 +0100

On Thu, Mar 14, 2024 at 01:56:12PM +0100, Pablo Neira Ayuso wrote:
> On Thu, Mar 14, 2024 at 01:43:52PM +0100, Sven Auhagen wrote:
> > On Thu, Mar 14, 2024 at 01:38:54PM +0100, Pablo Neira Ayuso wrote:
> > > On Thu, Mar 14, 2024 at 12:30:30PM +0100, Sven Auhagen wrote:
> [...]
> > > > I found this out.
> > > > The state is deleted in the end because the flow_offload_fixup_ct
> > > > function is pulling the FIN_WAIT timeout and deducts the offload_timeout
> > > > from it. This is 0 or very close to 0 and therefore ct gc is deleting the state
> > > > more or less right away after the flow_offload_teardown is called
> > > > (for the second time).
> > > 
> > > This used to be set to:
> > > 
> > >         timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> > > 
> > > but after:
> > > 
> > > commit e5eaac2beb54f0a16ff851125082d9faeb475572
> > > Author: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
> > > Date:   Tue May 17 10:44:14 2022 +0200
> > > 
> > >     netfilter: flowtable: fix TCP flow teardown
> > > 
> > > it uses the current state.
> > 
> > Yes since the TCP state might not be established anymore at that time.
> > Your patch will work but also give my TCP_FIN a very large timeout.
> 
> Is that still possible? My patch also fixes up conntrack _before_
> the offload flag is cleared, so the packet in the reply direction
> either sees the already fixed up conntrack or it follows flowtable
> datapath.

Yes, the call to flow_offload_teardown happens in the ingress stage.
When a TCP FIN is going through there and the offload is removed that is fine.
But between the packet now moving from ingress to the normal slow path netfilter
code a reply packet might come in the tcp state is still established
because the FIN is not processed yet and therefore it is offloading the state again.
The FIN is now processed after after re-offload and the state move on to FIN_WAIT.

I do not think there is a way to resolve this problem.
We would need to transition the TCP state right in the flowtable flow_offload_teardown
function to avoid this.

> 
> > I have successfully tested this version today:
> > 
> > -	if (timeout < 0)
> > -		timeout = 0;
> > +	// Have at least some time left on the state
> > +	if (timeout < NF_FLOW_TIMEOUT)
> > +		timeout = NF_FLOW_TIMEOUT;
> >
> > This makes sure that the timeout is not so big like ESTABLISHED but still enough
> > so the state does not time out right away.
> 
> This also seems sensible to me. Currently it is using the last conntrack
> state that we have observed when conntrack handed over this flow to the
> flowtable, which is inaccurate in any case, and which could still be low
> depending on user-defined tcp conntrack timeouts (in case user decided
> to tweaks them).