On Fri, Feb 11, 2022 at 3:37 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > > > On Fri, Feb 11, 2022 at 11:51 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > >> > >> When receiving netlink messages, libbpf was using a statically allocated > >> stack buffer of 4k bytes. This happened to work fine on systems with a 4k > >> page size, but on systems with larger page sizes it can lead to truncated > >> messages. The user-visible impact of this was that libbpf would insist no > >> XDP program was attached to some interfaces because that bit of the netlink > >> message got chopped off. > >> > >> Fix this by switching to a dynamically allocated buffer; we borrow the > >> approach from iproute2 of using recvmsg() with MSG_PEEK|MSG_TRUNC to get > >> the actual size of the pending message before receiving it, adjusting the > >> buffer as necessary. While we're at it, also add retries on interrupted > >> system calls around the recvmsg() call. > >> > >> Reported-by: Zhiqian Guan <zhguan@xxxxxxxxxx> > >> Fixes: 8bbb77b7c7a2 ("libbpf: Add various netlink helpers") > >> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > >> --- > >> tools/lib/bpf/netlink.c | 55 ++++++++++++++++++++++++++++++++++++++--- > >> 1 file changed, 52 insertions(+), 3 deletions(-) > >> > >> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c > >> index c39c37f99d5c..9a6e95206bf0 100644 > >> --- a/tools/lib/bpf/netlink.c > >> +++ b/tools/lib/bpf/netlink.c > >> @@ -87,22 +87,70 @@ enum { > >> NL_DONE, > >> }; > >> > >> +static int __libbpf_netlink_recvmsg(int sock, struct msghdr *mhdr, int flags) > > > > let's not use names starting with underscored. Just call it > > "netlink_recvmsg" or something like that. > > Alright, will fix. > > >> +{ > >> + int len; > >> + > >> + do { > >> + len = recvmsg(sock, mhdr, flags); > > > > recvmsg returns ssize_t, is it ok to truncate to int? > > In practice, yeah; the kernel is not going to return a single message > that overflows an int, even on 32bit. And with an int return type it's > more natural to return -errno instead of having the caller deal with > that. So unless you have strong objections I'd prefer to keep it this > way... yep, int is fine > > >> + } while (len < 0 && (errno == EINTR || errno == EAGAIN)); > >> + > >> + if (len < 0) > >> + return -errno; > >> + return len; > >> +} > >> + > >> +static int libbpf_netlink_recvmsg(int sock, struct msghdr *mhdr, char **buf) > >> +{ > >> + struct iovec *iov = mhdr->msg_iov; > >> + void *nbuf; > >> + int len; > >> + > >> + len = __libbpf_netlink_recvmsg(sock, mhdr, MSG_PEEK | MSG_TRUNC); > >> + if (len < 0) > >> + return len; > >> + > >> + if (len < 4096) > >> + len = 4096; > >> + > >> + if (len > iov->iov_len) { > >> + nbuf = realloc(iov->iov_base, len); > >> + if (!nbuf) { > >> + free(iov->iov_base); > >> + return -ENOMEM; > >> + } > >> + iov->iov_base = nbuf; > > > > this function both sets iov->iov_base *and* returns buf. It's quite a > > convoluted contract. Seems like buf is not necessary (and also NULL > > out iov->iov_base in case of error above?). But it might be cleaner to > > do this MSG_PEEK + realloc + recvmsg in libbpf_netlink_recv() > > explicitly. It's only one place. > > Hmm, yeah, if I wrap the realloc code in a small helper that works; will > fix. > > -Toke >