On Tue, May 23, 2023 at 12:49 AM Lennart Poettering <lennart@xxxxxxxxxxxxxx> wrote: > > On Do, 18.05.23 21:44, Alexei Starovoitov (alexei.starovoitov@xxxxxxxxx) wrote: > > > > The 0/1/2 file descriptors are not at all special. They are a shell > > > pipeline default, nothing more. They are not the argument your think they > > > are, and you should stop trying to make them an argument. > > > > I'm well aware that any file type is allowed to be in FDs 0,1,2 and > > some user space is using it that way, like old inetd: > > https://github.com/guillemj/inetutils/blob/master/src/inetd.c#L428 > > That puts the same socket into 0,1,2 before exec-ing new process. > > > > My point that the kernel has to assist user space instead of > > stubbornly sticking to POSIX and saying all FDs are equal. > > > > Most user space developers know that care should be taken with FDs 0,1,2, > > but it's still easy to make a mistake. > > If I look at libbpf, which supposedly gets the fd handling right I > can't find any hint it actually moves the fds it gets from open() to > an fd > 2, though? > > i.e. the code that invokes open() calls in the libbpf codebase happily > just accepts an fd < 2, including fd == 0, and this is then later > passed back into the kernel in various bpf() syscall invocations, > which should refuse it, no? So what's going on there? libbpf's attempt to ensure that fd>2 applies mostly to BPF objects: maps, progs, btfs, links, which are always returned from bpf() syscall. That's where we use ensure_good_fd(). The snippet you found in bpf_map__reuse_fd() is problematic and slipped through the cracks, I'll fix it, thanks for bringing this to my attention. I'll also check if there are any other places where we should use ensure_good_fd() as well. > > I did find this though: > > <snip> > new_fd = open("/", O_RDONLY | O_CLOEXEC); > if (new_fd < 0) { > err = -errno; > goto err_free_new_name; > } > > new_fd = dup3(fd, new_fd, O_CLOEXEC); > if (new_fd < 0) { > err = -errno; > goto err_close_new_fd; > } > </snip> > > (This is from libbpf.c, bpf_map__reuse_fd(), i.e. https://github.com/libbpf/libbpf/blob/master/src/libbpf.c) > > Not sure what's going on here, what is this about? you allocate an fd > you then immediately replace? Is this done to move the fd away from > fd=0? but that doesn't work that way, in case fd 0 is closed when > entering this function. > > Or is this about dup'ping with O_CLOEXEC? This code predates me, so I don't know the exact reasons why it's implemented exactly like this. It seems like instead of doing open()+dup3() we should do dup3()+ensure_good_fd(). I need to look at this carefully, this is not the code I work with regularly. > > Please be aware that F_DUPFD_CLOEXEC exists, which allows you to > easily move some fd above some treshold, *and* set O_CLOEXEC at the > same time. In the systemd codebase we call this frequently for code > that ends up being loaded in arbitrary processes (glibc NSS modules, > PAM modules), in order to ensure we get out of the fd < 3 territory > quickly. nice, thanks > > (btw, if you do care about O_CLOEXEC – which is great – then you also > want to replace a bunch of fopen(…, "r") with fopen(…, "re") in your > codebase) I will check this as well, thanks for the hints :) > > Lennart