On Fri, Oct 09, 2020 at 10:05:48PM +0200, Florian Westphal wrote: > Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxx> wrote: > > > The "delay unregister" remark was wrt. the "all rules were deleted" > > > case, i.e. add a "grace period" rather than acting right away when > > > conntrack use count did hit 0. > > > > Now I understand it, thanks really. The hooks are removed, so conntrack > > cannot "see" the packets and the entries become stale. > > Yes. > > > What is the rationale behind "remove the conntrack hooks when there are no > > rule left referring to conntrack"? Performance optimization? But then the > > content of the whole conntrack table could be deleted too... ;-) > > Yes, this isn't the case at the moment -- only hooks are removed, > entries will eventually time out. > > > > Conntrack entries are not removed, only the base hooks get unregistered. > > > This is a problem for tcp window tracking. > > > > > > When re-register occurs, kernel is supposed to switch the existing > > > entries to "loose" mode so window tracking won't flag packets as > > > invalid, but apparently this isn't enough to handle keepalive case. > > > > "loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it > > enables/disables picking up already established connections. > > > > nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) > > for non RST packets. > > You are right, mixup on my part. > > > But both seems to be modified only via the proc entries. > > Yes, we iterate table on re-register and modify the existing entries. For iptables-nft, it might be possible to avoid this deregister + register ct hooks in the same transaction: Maybe add something like nf_ct_netns_get_all() to bump refcounters by one _iff_ they are > 0 before starting the transaction processing, then call nf_ct_netns_put_all() which decrements refcounters and unregister hooks if they reach 0. The only problem with this approach is that this pulls in the conntrack module, to solve that, struct nf_ct_hook in net/netfilter/core.c could be used to store the reference to ->netns_get_all and ->net_put_all. Legacy would still be flawed though.