Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> writes: > setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is > specified by net.core.wmem_max Oh, good catch! Your revised patch LGTM, and is closer to what was being done in the immediately proceeding function, mnl_set_rcvbuffer. However, after thinking about it, I feel we should be checking the receiver value after setsockopt returns. If someone is running e.g. AppArmor, it seems better to me to attempt the non-privileged operation first, to avoid adding noise in the logs. Also, I don't think there are any current situations where SO_SNDBUFFORCE might also trim down the value, but after re-reading the man page, I'm not sure the contract precludes that in the future. Attached is a V3 patch for consideration, which also changes the code to attempt the non-privileged SO_RCVBUF before SO_RCVBUFFORCE. I defer to your judgment on which version is actually better; I tested both and they both work a) in a container where SO_SNDBUFFORCE fails, and b) outside a container with wmem_max set to a small-ish value where SO_SNDBUFFORCE is required.
diff --git a/src/mnl.c b/src/mnl.c index 26f943db..dcc22b82 100644 --- a/src/mnl.c +++ b/src/mnl.c @@ -259,10 +259,19 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl, if (newbuffsiz <= sndnlbuffsiz) return; - /* Rise sender buffer length to avoid hitting -EMSGSIZE */ - if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE, - &newbuffsiz, sizeof(socklen_t)) < 0) - return; + /* Raise sender buffer length to avoid hitting -EMSGSIZE. The kernel may + * reduce this to /proc/sys/net/core/wmem_max, see socket(7). + */ + sndnlbuffsiz = newbuffsiz; + if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF, + &sndnlbuffsiz, sizeof(socklen_t)) < 0 || sndnlbuffsiz < newbuffsiz) { + /* If SO_SNDBUF failed or the resulting size is still too small, try + * again with SO_SNDBUFFORCE. This requires CAP_NET_ADMIN. + */ + sndnlbuffsiz = newbuffsiz; + setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE, + &sndnlbuffsiz, sizeof(socklen_t)); + } } static unsigned int nlsndbufsiz; @@ -280,14 +289,16 @@ static int mnl_set_rcvbuffer(const struct mnl_socket *nl, socklen_t bufsiz) if (nlsndbufsiz >= bufsiz) return 0; - ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUFFORCE, - &bufsiz, sizeof(socklen_t)); - if (ret < 0) { - /* If this doesn't work, try to reach the system wide maximum - * (or whatever the user requested). + nlsndbufsiz = bufsiz; + ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUF, + &nlsndbufsiz, sizeof(socklen_t)); + if (ret < 0 || nlsndbufsiz < bufsiz) { + /* If this doesn't work, try again with SO_RCVBUFFORCE. This requires + * CAP_NET_ADMIN. */ - ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUF, - &bufsiz, sizeof(socklen_t)); + nlsndbufsiz = bufsiz; + ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUFFORCE, + &nlsndbufsiz, sizeof(socklen_t)); } return ret;
-- Dave Pifke, dave@xxxxxxxxx