Re: [PATCH nf-next RFC 2/2] netfilter: conntrack: skip event delivery for the netns exit path

Florian Westphal <fw@xxxxxxxxx> · Fri, 8 Apr 2022 21:34:13 +0200

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> 70e9942f17a6 ("netfilter: nf_conntrack: make event callback registration
> per-netns") introduced a per-netns callback for events to workaround a
> crash when delivering conntrack events on a stale per-netns nfnetlink
> kernel socket.
> 
> This patch adds a new flag to the nf_ct_iter_data object to skip event
> delivery from the netns cleanup path to address this issue.
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
> ---
> compiled tested only.
> @Florian: Maybe this helps to remove the per-netns nf_conntrack_event_cb
> callback without having to update nfnetlink to deal with this corner case?

Old crash recipe is (from your changelog of the 'make it pernet' change):

 0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
 1) container is started.
 2) connect to it via lxc-console.
 3) generate some traffic with the container to create some conntrack
    entries in its table.
 4) stop the container: you hit one oops because the conntrack table
    cleanup tries to report the destroy event to user-space but the
    per-netns nfnetlink socket has already gone (as the nfnetlink
    socket is per-netns but event callback registration is global).

Pernet exit handlers are called in reverse order of the module load
order, so normally this means:

ctnetlink exit handlers
nfnetlink_net_exit_batch, removes nfnl socket
nf_conntrack_pernet_exit(), removes entries,

Because callback is pernet atm this prevents crash after nfntlink sk
has been closed.

If thats no longer the case, we need some other way to suppress
calls with stale nfnl sk.

With the proposed patch series its still possible that we end up
in nfnetlink via  the ctnl event handler.

E.g. gc worker could evit at the right time, or some kfree_skb call
ends up dropping last reference.

If you really dislike the nfnl changes I will respin without this
and will keep the pernet ctnetlink callback.