Re: fd == 0 means AT_FDCWD BPF_OBJ_GET commands

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 23 May 2023 10:25:40 -0700

On Tue, May 23, 2023 at 12:49 AM Lennart Poettering
<lennart@xxxxxxxxxxxxxx> wrote:
>
> On Do, 18.05.23 21:44, Alexei Starovoitov (alexei.starovoitov@xxxxxxxxx) wrote:
>
> > > The 0/1/2 file descriptors are not at all special. They are a shell
> > > pipeline default, nothing more. They are not the argument your think they
> > > are, and you should stop trying to make them an argument.
> >
> > I'm well aware that any file type is allowed to be in FDs 0,1,2 and
> > some user space is using it that way, like old inetd:
> > https://github.com/guillemj/inetutils/blob/master/src/inetd.c#L428
> > That puts the same socket into 0,1,2 before exec-ing new process.
> >
> > My point that the kernel has to assist user space instead of
> > stubbornly sticking to POSIX and saying all FDs are equal.
> >
> > Most user space developers know that care should be taken with FDs 0,1,2,
> > but it's still easy to make a mistake.
>
> If I look at libbpf, which supposedly gets the fd handling right I
> can't find any hint it actually moves the fds it gets from open() to
> an fd > 2, though?
>
> i.e. the code that invokes open() calls in the libbpf codebase happily
> just accepts an fd < 2, including fd == 0, and this is then later
> passed back into the kernel in various bpf() syscall invocations,
> which should refuse it, no? So what's going on there?

libbpf's attempt to ensure that fd>2 applies mostly to BPF objects:
maps, progs, btfs, links, which are always returned from bpf()
syscall. That's where we use ensure_good_fd(). The snippet you found
in bpf_map__reuse_fd() is problematic and slipped through the cracks,
I'll fix it, thanks for bringing this to my attention. I'll also check
if there are any other places where we should use ensure_good_fd() as
well.

>
> I did find this though:
>
> <snip>
>         new_fd = open("/", O_RDONLY | O_CLOEXEC);
>         if (new_fd < 0) {
>                 err = -errno;
>                 goto err_free_new_name;
>         }
>
>         new_fd = dup3(fd, new_fd, O_CLOEXEC);
>         if (new_fd < 0) {
>                 err = -errno;
>                 goto err_close_new_fd;
>         }
> </snip>
>
> (This is from libbpf.c, bpf_map__reuse_fd(), i.e. https://github.com/libbpf/libbpf/blob/master/src/libbpf.c)
>
> Not sure what's going on here, what is this about? you allocate an fd
> you then immediately replace? Is this done to move the fd away from
> fd=0?  but that doesn't work that way, in case fd 0 is closed when
> entering this function.
>
> Or is this about dup'ping with O_CLOEXEC?

This code predates me, so I don't know the exact reasons why it's
implemented exactly like this. It seems like instead of doing
open()+dup3() we should do dup3()+ensure_good_fd(). I need to look at
this carefully, this is not the code I work with regularly.

>
> Please be aware that F_DUPFD_CLOEXEC exists, which allows you to
> easily move some fd above some treshold, *and* set O_CLOEXEC at the
> same time. In the systemd codebase we call this frequently for code
> that ends up being loaded in arbitrary processes (glibc NSS modules,
> PAM modules), in order to ensure we get out of the fd < 3 territory
> quickly.

nice, thanks

>
> (btw, if you do care about O_CLOEXEC – which is great – then you also
> want to replace a bunch of fopen(…, "r") with fopen(…, "re") in your
> codebase)

I will check this as well, thanks for the hints :)

>
> Lennart