Re: fd == 0 means AT_FDCWD BPF_OBJ_GET commands

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Thu, 18 May 2023 09:25:08 -0700

On Thu, May 18, 2023 at 10:38:46AM +0200, Christian Brauner wrote:
> On Wed, May 17, 2023 at 09:17:36AM -0700, Alexei Starovoitov wrote:
> > On Wed, May 17, 2023 at 5:05 AM Christoph Hellwig <hch@xxxxxx> wrote:
> > >
> > > On Wed, May 17, 2023 at 11:11:24AM +0200, Christian Brauner wrote:
> > > > Adding fsdevel so we're aware of this quirk.
> > > >
> > > > So I'm not sure whether this was ever discussed on fsdevel when you took
> > > > the decision to treat fd 0 as AT_FDCWD or in general treat fd 0 as an
> > > > invalid value.
> > >
> > > I've never heard of this before, and I think it is compltely
> > > unacceptable. 0 ist just a normal FD, although one that happens to
> > > have specific meaning in userspace as stdin.
> > >
> > > >
> > > > If it was discussed then great but if not then I would like to make it
> > > > very clear that if in the future you decide to introduce custom
> > > > semantics for vfs provided infrastructure - especially when exposed to
> > > > userspace - that you please Cc us.
> > >
> > > I don't think it's just the future.  We really need to undo this ASAP.
> > 
> > Christian is not correct in stating that treatment of fd==0 as invalid
> > bpf object applies to vfs fd-s.
> > The path_fd addition in this patch is really the very first one of this kind.
> > At the same time bpf anon fd-s (progs, maps, links, btfs) with fd == 0
> > are invalid and this is not going to change. It's been uapi for a long time.
> > 
> > More so fd-s 0,1,2 are not "normal FDs".
> > Unix has made two mistakes:
> > 1. fd==0 being valid fd
> > 2. establishing convention that fd-s 0,1,2 are stdin, stdout, stderr.
> > 
> > The first mistake makes it hard to pass FD without an extra flag.
> > The 2nd mistake is just awful.
> > We've seen plenty of severe datacenter wide issues because some
> > library or piece of software assumes stdin/out/err.
> > Various services have been hurt badly by this "convention".
> > In libbpf we added ensure_good_fd() to make sure none of bpf objects
> > (progs, maps, etc) are ever seen with fd=0,1,2.
> > Other pieces of datacenter software enforce the same.
> > 
> > In other words fds=0,1,2 are taken. They must not be anything but
> > stdin/out/err or gutted to /dev/null.
> > Otherwise expect horrible bugs and multi day debugging.
> > 
> > Because of that, several years ago, we've decided to fix unix mistake #1
> > when it comes to bpf objects and started reserving fd=0 as invalid.
> > This patch is proposing to do the same for path_fd (normal vfs fd) when
> 
> It isn't as you now realized but I'm glad we cleared that up off-list.
> 
> > it is passed to bpf syscall. I think it's a good trade-off and fits
> > the rest of bpf uapi.
> > 
> > Everyone who's hiding behind statements: but POSIX is a standard..
> > or this is how we've been doing things... are ignoring the practical
> > situation at hand. fd-s 0,1,2 are taken. Make sure your sw never produces them.
> 
> (Prefix: Imagine me calmly writing this and in a relaxed tone.)
> 
> Just to clarify. I do think that deciding that 0 is an invalid file

We're still talking past each other.
0 is an invalid bpf object. Not file.
There is a difference.
The kernel is breaking user space by returning non-file FDs in 0,1,2.
Especially as fd = 1 and 2.
ensure_good_fd() in libbpf is a library workaround to make sure bpf objects
are not the reason for user app brekage.
I firmly believe that making kernel return socket FDs and other special FDs with fd >=3
(under new sysctl, for example) will prevent user space breakage.

And to answer Ted's question..
Yes. It's a security issue, but it's the other way around.
The kernel returning non vfs file FD in [0,1,2] range is a security issue.
I'm proposing to fix it with new sysctl or boot flag.