Hi Patrick, > -----Original Message----- > From: netfilter-devel-owner@xxxxxxxxxxxxxxx > [mailto:netfilter-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of > Patrick McHardy > Sent: Wednesday, February 27, 2008 6:31 PM > To: Juneja Kapil > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Medve Emilian > Subject: Re: [PATCH] nf_conntrack_core: Updated nf_conntrack > to destroy/refresh conn irrespective of del_timer status > > Juneja Kapil wrote: > >> -----Original Message----- > >> From: Patrick McHardy [mailto:kaber@xxxxxxxxx] > >> Sent: Monday, February 25, 2008 5:41 PM > >> To: Juneja Kapil > >> Cc: netfilter-devel@xxxxxxxxxxxxxxx; Medve Emilian > >> Subject: Re: [PATCH] nf_conntrack_core: Updated nf_conntrack to > >> destroy/refresh conn irrespective of del_timer status > >> > >> Kapil Juneja wrote: > >>> Currently NF_CONNTRACK assumes that a running timer is > >> present before refreshing > >>> the connection or destroying it. This may not be the case > >> when, for example, > >>> another forwarding engine hooks up to it to listen to new > >> connections > >>> but disables the NF_CONNTRACK timer in order to have more control. > >>> In such a scenario, only control packets may be terminated > >> to NF_CONNTRACK for > >>> it to decode and update the connection status. It will not > >> impact the present > >>> scenario of kernel forwarding without the aid of any > >> forwarding engine. > >> > >> Do you have a pointer to the code you're talking about? > > > > The forwarding engine concept is same as the Grand Unified > Flow Cache > > idea mooted by Rusty Russel some time back: > > http://lwn.net/Articles/194443/ > > > > Our architecture runs on three components - Linux NF_CONNTRACK, a > > Control (Linux) Module(CM), and a Forwarding Engine or Flow Caching > > system (FC): > > 1) The CM registers itself to the NF_CONNTRACK notifier chain. > > Whenever an assured connection notifier event is received, > it extracts > > all the relevant tuple parameters (Src IP, Dst IP, > Protocol, Src Port, > > Dst Port > > etc.) and caches them to the FC. Due to reasons mentioned > > subsequently, it also disables the timer for the said conntrack > > object. The object, however, remains in the conntrack list > as long as it is not destroyed. > > 3) The FC, sitting at the ethernet driver level, sends all the data > > packets belonging to the cached connection directly to the outbound > > port (as identified in step 2), bypassing the Linux stack > altogether. > > All the packets not belonging to either of the cached tuples are > > terminated to the Linux stack. Also TCP control packets FIN/RST/SYN > > are terminated irrespective of wether the connection is > cached or not. > > 4) With assistance from the FC, the CM also runs aging on > the cached > > connections (hence the requirement of deleting the > NF_CONNTRACK timer > > in step 2) > > 5) Cached connections can be terminated (i.e, removed from > cache) in > > two > > ways: > > i) Aging out by the CM: In this scenario, the CM removes the > > connection tuple from FC as well as NF_CONNTRACK by calling the > > corresponding timer destroy function directly. > > ii) Destroy via TCP control packet: All the FIN-ACK, > RST, RST-ACK > > packets are send to conntrack irrespective of the fact that > they match > > a cached tuple. They are picked up by the TCP conntrack > module which > > restarts the accounting and refreshes the connection state. > It is at > > this point that the first chunk of this patch comes into picture. > > 6) When the NF_CONNTRACK module is removed, it iterates through the > > list to destroy the detected connections. Currently, it does not > > remove those connections whose timers have gone off (which > is the case > > with connections cached to FC). This is fixed by the second > chunk of > > the patch. > > > That sounds pretty reasonable. Is that code available somewhere? We are working on licensing aspects and will be glad to share the code as and when we get approval. > > >>> + if (newtime - ct->timeout.expires >= HZ) { > >>> + /* > >>> + * The timer could have already been deleted > >>> + * while still alive (for example connection > >>> + * offloaded to a forwarding module other than > >>> + * the kernel stack). > >>> + */ > >>> + mod_timer(&ct->timeout, newtime); > >>> event = IPCT_REFRESH; > >> This adds a race, we don't want to update the timer if it already > >> went off this that means the connection is already destroyed. > >> Same problem with the other chunk. > >> > > > > A timer call would have invalidated the conntrack by a call to > > 'death_by_timeout' (or similar such routine), thereby > rendering this > > check redundant. Theoretically, I think the check is irrelevant > > unless a hypothetical timeout doesn't really invalidate the > conntrack. > > Can you describe the race scenario mentioned by you? > > Very simple: > > CPU0 CPU1 > timer goes off > refresh_timer: mod_timer, rearm death_by_timeout() > > timer goes off again > > Using del_timer prevents us from rearming the timer if it > already went off. Thanks for the explanation. I was probably only thinking of the non-SMP scenarios. However, I feel that if this cannot be done this simple way, then we are in a bit trouble because our need is to make nf_ct_refresh_acct work independent of the existing timer being dead or alive. While we think about a possible alternatives on respin of patch/control module, can you provide some insight into any other alternatives. > - > To unsubscribe from this list: send the line "unsubscribe > netfilter-devel" in the body of a message to > majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html