Hi Florian, You wrote on Tue, Jun 25, 2019 at 06:53:44PM +0200: > Thanks for this detailed analysis. > In this specific case I think this is enough: > > diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c > index 92077d459109..61ba92415480 100644 > --- a/net/netfilter/nfnetlink.c > +++ b/net/netfilter/nfnetlink.c > @@ -578,7 +578,8 @@ static int nfnetlink_bind(struct net *net, int group) > ss = nfnetlink_get_subsys(type << 8); > rcu_read_unlock(); > if (!ss) > - request_module("nfnetlink-subsys-%d", type); > + request_module_nowait("nfnetlink-subsys-%d", type); > return 0; > } > #endif thanks for the patch! We finally found an easy way to reproduce the deadlock, the following commands instantly trigger the problem on our machines: rmmod nf_conntrack_netlink rmmod xfrm_user conntrack -e NEW -E & modprobe -v xfrm_user Note: the "-e" filter is needed to trigger the problematic code path in the kernel. We were worried that using "_nowait" would introduce other race conditions, since the requested service might not be available by the time it is required. On the other hand, if we understand correctly, it seems that after "nfnetlink_bind()", the caller will listen on the socket for messages regardless whether the needed modules are loaded, loading or unloaded. To verify this we added a three second sleep during the initialisation of nf_conntrack_netlink. The events started to appear after the delayed init was completed. If this is the case, then using "_nowait" should suffice as a fix for the problem. Could you please confirm these assumptions and give us some piece of mind? Best regards, Juliana Rodrigueiro and Thomas Jarosch