Fwd: possible fd handling bug/issue/opportunities? in libbpf

Maciej Żenczykowski <maze@xxxxxxxxxx> · Thu, 26 Oct 2023 14:29:49 -0700

Alexei said he's travelling busy and to resend to bpf@vger, so here it goes:

https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L4525

int bpf_map__reuse_fd(struct bpf_map *map, int fd) {
...
/*
* Like dup(), but make sure new FD is >= 3 and has O_CLOEXEC set.
* This is similar to what we do in ensure_good_fd(), but without
* closing original FD.
*/
new_fd = fcntl(fd, F_DUPFD_CLOEXEC, 3);
if (new_fd < 0) {
  err = -errno;
  goto err_free_new_name;
}

err = zclose(map->fd);
if (err) {
  err = -errno;
  goto err_close_new_fd;
}

This is based *purely* on very local code inspection of the ~50 lines
up above - so I may well be wrong - but:

(a) optimization opportunity: there should be no need to
F_DUPFD_CLOEXEC 3 if fd is already >= 3
   I assume in this case map->fd == fd, though it's not clear from
looking at the function...
   why is fd a direct argument? maybe it's trying to take ownership
while still allowing the caller to close the passed in fd?

it feels like this function should:
  close(map->fd) almost on entry
  if (fd < 3) {new_fd = dupfd_cloexec(fd,3) and close(fd); fd = new_fd; }
  map->fd = fd

Note: I'm assuming here that kernel returned map fds already have
cloexec set, which I seem to recall was the case?

Maybe pass in *fd instead of fd?

(b) the close() system call closes the file descriptor *even* if it
returns an error
  it's likely a mistake to be checking what value zclose() returns.
  (close() returns *pending* errors but it always releases the file descriptor)
  looking at the implementation of zclose() earlier in the file, it's
aware of this.
  as such I think:

err = zclose(map->fd);
if (err) {
err = -errno;
goto err_close_new_fd;
}

should probably be *just*
  (void)zclose(map->fd);
or even just
  close(map->fd)
since map->fd will be overwritten in a moment.

btw. happened to look at this because I ran across the
"fd == 0 means AT_FDCWD BPF_OBJ_GET commands" thread.
It looks like the kernel community is perfectly okay with the kernel
throwing land mines at userspace...
I totally agree with you that for most new fd-like things (incl.
timerfd's, bpf, etc.)
the kernel should never allocate them with index <3 because it's *too*
ripe for abuse.

Anyway, the above comments *might* not be correct.
I didn't do a deep analysis or anything, just something I noticed /
was confused by on a quick glance.
There may be deeper reasons why the code should be the way it is
(though perhaps it could use more comments then)

- Maciej