Phil Sutter <phil@xxxxxx> wrote: > Starting firewalld with two active zones in an lxc container provokes a > situation in which nfnetlink_rcv_msg() loops indefinitely, because > nc->call_rcu() (nf_tables_getgen() in this case) returns -EAGAIN every > time. > > I identified netlink_attachskb() as the originator for the above error > code. The conditional leading to it looks like this: > > | if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || > | test_bit(NETLINK_S_CONGESTED, &nlk->state))) { > | [...] > | if (!*timeo) { > > *timeo is zero, so this seems to be a non-blocking socket. Both > NETLINK_S_CONGESTED bit is set and sk->sk_rmem_alloc exceeds > sk->sk_rcvbuf. > > From user space side, firewalld seems to simply call sendto() and the > call never returns. > > How to solve that? I tried to find other code which does the same, but I > haven't found one that does any looping. Should nfnetlink_rcv_msg() > maybe just return -EAGAIN to the caller if it comes from call_rcu > backend? Yes, I think thats the most straightforward solution. We can of course also intercept -EAGAIN in nf_tables_api.c and translate it to -ENOBUFS like in nft_get_set_elem(). But I think a generic solution it better. The call_rcu backends should not result in changes to nf_tables internal state so they do not load modules and therefore don't need a restart. > This happening only in an lxc container may be due to some setsockopt() > calls not being allowed. In particular, setsockopt(SO_RCVBUFFORCE) > returns EPERM. Right. > The value of sk_rcvbuf is 425984, BTW. sk_rmem_alloc is 426240. In user > space, I see a call to setsockopt(SO_RCVBUF) with value 4194304. No idea > if this is related and how. Does that SO_RCVBUF succeed? How large is the recvbuf? We should try to investigate and see that nft works rather than just fix the loop and claim that fixes the bug (but just changes 'nft loops' to 'nft exits with an error').