On Fri, Feb 11, 2022 at 11:51 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > When receiving netlink messages, libbpf was using a statically allocated > stack buffer of 4k bytes. This happened to work fine on systems with a 4k > page size, but on systems with larger page sizes it can lead to truncated > messages. The user-visible impact of this was that libbpf would insist no > XDP program was attached to some interfaces because that bit of the netlink > message got chopped off. > > Fix this by switching to a dynamically allocated buffer; we borrow the > approach from iproute2 of using recvmsg() with MSG_PEEK|MSG_TRUNC to get > the actual size of the pending message before receiving it, adjusting the > buffer as necessary. While we're at it, also add retries on interrupted > system calls around the recvmsg() call. > > Reported-by: Zhiqian Guan <zhguan@xxxxxxxxxx> > Fixes: 8bbb77b7c7a2 ("libbpf: Add various netlink helpers") > Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > --- > tools/lib/bpf/netlink.c | 55 ++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 52 insertions(+), 3 deletions(-) > > diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c > index c39c37f99d5c..9a6e95206bf0 100644 > --- a/tools/lib/bpf/netlink.c > +++ b/tools/lib/bpf/netlink.c > @@ -87,22 +87,70 @@ enum { > NL_DONE, > }; > > +static int __libbpf_netlink_recvmsg(int sock, struct msghdr *mhdr, int flags) let's not use names starting with underscored. Just call it "netlink_recvmsg" or something like that. > +{ > + int len; > + > + do { > + len = recvmsg(sock, mhdr, flags); recvmsg returns ssize_t, is it ok to truncate to int? > + } while (len < 0 && (errno == EINTR || errno == EAGAIN)); > + > + if (len < 0) > + return -errno; > + return len; > +} > + > +static int libbpf_netlink_recvmsg(int sock, struct msghdr *mhdr, char **buf) > +{ > + struct iovec *iov = mhdr->msg_iov; > + void *nbuf; > + int len; > + > + len = __libbpf_netlink_recvmsg(sock, mhdr, MSG_PEEK | MSG_TRUNC); > + if (len < 0) > + return len; > + > + if (len < 4096) > + len = 4096; > + > + if (len > iov->iov_len) { > + nbuf = realloc(iov->iov_base, len); > + if (!nbuf) { > + free(iov->iov_base); > + return -ENOMEM; > + } > + iov->iov_base = nbuf; this function both sets iov->iov_base *and* returns buf. It's quite a convoluted contract. Seems like buf is not necessary (and also NULL out iov->iov_base in case of error above?). But it might be cleaner to do this MSG_PEEK + realloc + recvmsg in libbpf_netlink_recv() explicitly. It's only one place. > + iov->iov_len = len; > + } > + > + len = __libbpf_netlink_recvmsg(sock, mhdr, 0); > + if (len > 0) > + *buf = iov->iov_base; > + return len; > +} > + > static int libbpf_netlink_recv(int sock, __u32 nl_pid, int seq, > __dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn, > void *cookie) > { > + struct iovec iov = {}; > + struct msghdr mhdr = { > + .msg_iov = &iov, > + .msg_iovlen = 1, > + }; > bool multipart = true; > struct nlmsgerr *err; > struct nlmsghdr *nh; > - char buf[4096]; > int len, ret; > + char *buf; > + > > while (multipart) { > start: > multipart = false; > - len = recv(sock, buf, sizeof(buf), 0); > + len = libbpf_netlink_recvmsg(sock, &mhdr, &buf); > if (len < 0) { > - ret = -errno; > + ret = len; > goto done; > } > > @@ -151,6 +199,7 @@ static int libbpf_netlink_recv(int sock, __u32 nl_pid, int seq, > } > ret = 0; > done: > + free(iov.iov_base); double free on -ENOMEM? And even more confusing why you bother with buf at all... > return ret; > } > > -- > 2.35.1 >