-- Andrii On Mon, Jun 8, 2020 at 12:45 PM Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote: > > On Mon, 8 Jun 2020 11:36:33 -0700 > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > > On Mon, Jun 8, 2020 at 9:51 AM Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote: > > > > > > This patch change BPF syscall to avoid returning file descriptor value zero. > > > > > > As mentioned in cover letter, it is very impractical when extending kABI > > > that the file-descriptor value 'zero' is valid, as this requires new fields > > > must be initialised as minus-1. First step is to change the kernel such that > > > BPF-syscall simply doesn't return value zero as a FD number. > > > > > > This patch achieves this by similar code to anon_inode_getfd(), with the > > > exception of getting unused FD starting from 1. The kernel already supports > > > starting from a specific FD value, as this is used by f_dupfd(). It seems > > > simpler to replicate part of anon_inode_getfd() code and use this start from > > > offset feature, instead of using f_dupfd() handling afterwards. > > > > Wouldn't it be better to just handle that on libbpf side? That way it > > works on all kernels and doesn't require this duplication of logic > > inside kernel? > > IMHO this is needed on the kernel side, to pair it with the API change. > I don't see how doing this in libbpf can cover all cases. > In practice, it's going to be very rare that fd=0 is not already allocated for application. So even not doing anything is going to work in 99.9% of cases. Think about this, any realistic BPF program will have a map or global variable associated with it. So in the rare case where we have FD 0 not used, map will get that FD. Even if not, if you load your program from ELF file, that ELF file will get FD 0. Even if not, modern BPF programs will have BTF object associated, which will get FD 0. And so on... Even daemons will probably already have some FD open for whatever logging/output it needs to do (it doesn't have to be stdout). So this FD=0 is very hypothetical case, very. You have to essentially stage it consciously, to actually hit it. Second of all, this BPF-specific FD allocation logic is just that -- duplication and extra code to maintain. Other folks in kernel will eventually just be "huh? bpf needs its own anon_file API, why?!..." Do we really want more of that? Third, you already missed anon_inode_getfile use in bpf_link_prime(), and in the future we'll undoubtedly will miss something else, so this FD >= 1 from kernel will work only sometimes and no one will notice until it breaks for someone, which won't happen for a while (because it works as is most of the time, see above). I'm not even talking about the fact that we are also subverting standard Linux promise that a user gets the smallest available FD in the system... So I think libbpf can be kind to user and do: int fd = bpf_whatever(); if (fd == 0) { fd = dup(0); close(0); } But even if it doesn't, not a big deal and probably no one ever will have this problem with FD 0 for BPF program. > First of all, some users might not be using libbpf. > > Second, a userspace application could be using an older version of > libbpf on a newer kernel. (Notice this is not only due to older > distros, but also because projects using git submodule libbpf will > freeze at some point in time that worked). Theoretically this is a problem, in practice libbpf is always more up-to-date compared to kernel... so I don't buy this argument, honestly :) > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer >