On Wed, May 29, 2019 at 03:13:42PM +0200, Phil Sutter wrote: > When committing a larger transaction (e.g. adding 300 rules) with echo > output turned on, mnl_batch_talk() would report ENOBUFS after the first > call to mnl_socket_recvfrom(). (ENOBUFS indicates congestion in netlink > socket.) We can avoid this if we select the right buffer size for the --echo case, to make this reliable. For events, that's a different case, there is not much we can do in case this hits ENOBUFS, since we don't know how much information the kernel will send to us, so we can just report message losts to the users. > The problem in mnl_batch_talk() was a combination of unmodified socket > recv buffer, use of select() and unhandled ENOBUFS condition (abort > instead of retry). > > This series solves the issue, admittedly a bit in sledge hammer method: > Maximize nf_sock receive buffer size for all users, make > mnl_batch_talk() fetch more messages at once and retry upon ENOBUFS > instead of just giving up. Setting a fixed size works around the problem, yes. But still we will hit ENOBUFS at some point. I sent you a patch to start estimating the size of the receiver buffer size in a simple way. > There was also a problem with select() use which motivated the loop > rewrite in Patch 3. Please, send a patch to fix this, thanks! > Actually, replacing the whole loop by a simple call to > nft_mnl_recv() worked and was even sufficient in avoiding ENOBUFS > condition, but I am not sure if that has other side-effects. Not sure what you mean. > tests/shell/testcases/transactions/0049huge_0 | 14 ++++ Thanks for this testcase.