Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > On Fri, Feb 11, 2022 at 11:51 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> When receiving netlink messages, libbpf was using a statically allocated >> stack buffer of 4k bytes. This happened to work fine on systems with a 4k >> page size, but on systems with larger page sizes it can lead to truncated >> messages. The user-visible impact of this was that libbpf would insist no >> XDP program was attached to some interfaces because that bit of the netlink >> message got chopped off. >> >> Fix this by switching to a dynamically allocated buffer; we borrow the >> approach from iproute2 of using recvmsg() with MSG_PEEK|MSG_TRUNC to get >> the actual size of the pending message before receiving it, adjusting the >> buffer as necessary. While we're at it, also add retries on interrupted >> system calls around the recvmsg() call. >> >> Reported-by: Zhiqian Guan <zhguan@xxxxxxxxxx> >> Fixes: 8bbb77b7c7a2 ("libbpf: Add various netlink helpers") >> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> >> --- >> tools/lib/bpf/netlink.c | 55 ++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 52 insertions(+), 3 deletions(-) >> >> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c >> index c39c37f99d5c..9a6e95206bf0 100644 >> --- a/tools/lib/bpf/netlink.c >> +++ b/tools/lib/bpf/netlink.c >> @@ -87,22 +87,70 @@ enum { >> NL_DONE, >> }; >> >> +static int __libbpf_netlink_recvmsg(int sock, struct msghdr *mhdr, int flags) > > let's not use names starting with underscored. Just call it > "netlink_recvmsg" or something like that. Alright, will fix. >> +{ >> + int len; >> + >> + do { >> + len = recvmsg(sock, mhdr, flags); > > recvmsg returns ssize_t, is it ok to truncate to int? In practice, yeah; the kernel is not going to return a single message that overflows an int, even on 32bit. And with an int return type it's more natural to return -errno instead of having the caller deal with that. So unless you have strong objections I'd prefer to keep it this way... >> + } while (len < 0 && (errno == EINTR || errno == EAGAIN)); >> + >> + if (len < 0) >> + return -errno; >> + return len; >> +} >> + >> +static int libbpf_netlink_recvmsg(int sock, struct msghdr *mhdr, char **buf) >> +{ >> + struct iovec *iov = mhdr->msg_iov; >> + void *nbuf; >> + int len; >> + >> + len = __libbpf_netlink_recvmsg(sock, mhdr, MSG_PEEK | MSG_TRUNC); >> + if (len < 0) >> + return len; >> + >> + if (len < 4096) >> + len = 4096; >> + >> + if (len > iov->iov_len) { >> + nbuf = realloc(iov->iov_base, len); >> + if (!nbuf) { >> + free(iov->iov_base); >> + return -ENOMEM; >> + } >> + iov->iov_base = nbuf; > > this function both sets iov->iov_base *and* returns buf. It's quite a > convoluted contract. Seems like buf is not necessary (and also NULL > out iov->iov_base in case of error above?). But it might be cleaner to > do this MSG_PEEK + realloc + recvmsg in libbpf_netlink_recv() > explicitly. It's only one place. Hmm, yeah, if I wrap the realloc code in a small helper that works; will fix. -Toke