On Fri, Apr 08, 2022 at 09:34:13PM +0200, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > 70e9942f17a6 ("netfilter: nf_conntrack: make event callback registration > > per-netns") introduced a per-netns callback for events to workaround a > > crash when delivering conntrack events on a stale per-netns nfnetlink > > kernel socket. > > > > This patch adds a new flag to the nf_ct_iter_data object to skip event > > delivery from the netns cleanup path to address this issue. > > > > Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > --- > > compiled tested only. > > @Florian: Maybe this helps to remove the per-netns nf_conntrack_event_cb > > callback without having to update nfnetlink to deal with this corner case? > > Old crash recipe is (from your changelog of the 'make it pernet' change): > > 0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded. > 1) container is started. > 2) connect to it via lxc-console. > 3) generate some traffic with the container to create some conntrack > entries in its table. > 4) stop the container: you hit one oops because the conntrack table > cleanup tries to report the destroy event to user-space but the > per-netns nfnetlink socket has already gone (as the nfnetlink > socket is per-netns but event callback registration is global). > > Pernet exit handlers are called in reverse order of the module load > order, so normally this means: > > ctnetlink exit handlers > nfnetlink_net_exit_batch, removes nfnl socket > nf_conntrack_pernet_exit(), removes entries, > > Because callback is pernet atm this prevents crash after nfntlink sk > has been closed. > > If thats no longer the case, we need some other way to suppress > calls with stale nfnl sk. > > With the proposed patch series its still possible that we end up > in nfnetlink via the ctnl event handler. > > E.g. gc worker could evit at the right time, or some kfree_skb call > ends up dropping last reference. > > If you really dislike the nfnl changes I will respin without this > and will keep the pernet ctnetlink callback. OK, my patch is not covering all the possible cases then. Probably we can remove the hooks from .pre_exit, then force a run of the garbage collector from there. Then .exit path skips event delivery as my patch does. This would allow to remove the per-netns callback workaround, and all would be handled from nf_conntrack instead?