Hi Florian, On Tue, Aug 13, 2019 at 10:12:46PM +0200, Florian Westphal wrote: > tests/shell/testcases/transactions/0049huge_0 > > still fails with ENOBUFS error after endian fix done in > previous patch. Its enough to increase the scale factor (4) > on s390x, but rather than continue with these "guess the proper > size" game, just increase the buffer size and retry up to 3 times. > > This makes above test work on s390x. > > So, implement what Pablo suggested in the earlier commit: > We could also explore increasing the buffer and retry if > mnl_nft_socket_sendmsg() hits ENOBUFS if we ever hit this problem again. > > v2: call setsockopt unconditionally, then increase on error. > > Signed-off-by: Florian Westphal <fw@xxxxxxxxx> > --- > src/mnl.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/src/mnl.c b/src/mnl.c > index 97a2e0765189..9c1f5356c9b9 100644 > --- a/src/mnl.c > +++ b/src/mnl.c > @@ -311,6 +311,7 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl); > uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch); > char rcv_buf[MNL_SOCKET_BUFFER_SIZE]; > + unsigned int enobuf_restarts = 0; > size_t avg_msg_size, batch_size; > const struct sockaddr_nl snl = { > .nl_family = AF_NETLINK > @@ -320,6 +321,7 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > .tv_usec = 0 > }; > struct iovec iov[iov_len]; > + unsigned int scale = 4; > struct msghdr msg = {}; > fd_set readfds; > > @@ -328,7 +330,8 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); > avg_msg_size = div_round_up(batch_size, num_cmds); > > - mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * 4); > +restart: > + mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale); > > ret = mnl_nft_socket_sendmsg(ctx, &msg); > if (ret == -1) > @@ -347,8 +350,13 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > break; > > ret = mnl_socket_recvfrom(nl, rcv_buf, sizeof(rcv_buf)); > - if (ret == -1) > + if (ret == -1) { > + if (errno == ENOBUFS && enobuf_restarts++ < 3) { > + scale *= 2; > + goto restart; > + } If this restart is triggered it causes rules to be duplicated. We send the same batch again. I'm hitting this on x86_64. Maybe we need find a better way to estimate the rcvbuffer in the case of --echo. By the time we see ENOBUFS we're already in a bad way - events have already be lost. > return -1; > + } > > ret = mnl_cb_run(rcv_buf, ret, 0, portid, &netlink_echo_callback, ctx); > /* Continue on error, make sure we get all acknowledgments */ > -- > 2.21.0 >