Re: fd == 0 means AT_FDCWD BPF_OBJ_GET commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 18, 2023 at 07:20:29PM +0200, Christian Brauner wrote:
> On Thu, May 18, 2023 at 09:25:08AM -0700, Alexei Starovoitov wrote:
> > On Thu, May 18, 2023 at 10:38:46AM +0200, Christian Brauner wrote:
> > > On Wed, May 17, 2023 at 09:17:36AM -0700, Alexei Starovoitov wrote:
> > > > On Wed, May 17, 2023 at 5:05 AM Christoph Hellwig <hch@xxxxxx> wrote:
> > > > >
> > > > > On Wed, May 17, 2023 at 11:11:24AM +0200, Christian Brauner wrote:
> > > > > > Adding fsdevel so we're aware of this quirk.
> > > > > >
> > > > > > So I'm not sure whether this was ever discussed on fsdevel when you took
> > > > > > the decision to treat fd 0 as AT_FDCWD or in general treat fd 0 as an
> > > > > > invalid value.
> > > > >
> > > > > I've never heard of this before, and I think it is compltely
> > > > > unacceptable. 0 ist just a normal FD, although one that happens to
> > > > > have specific meaning in userspace as stdin.
> > > > >
> > > > > >
> > > > > > If it was discussed then great but if not then I would like to make it
> > > > > > very clear that if in the future you decide to introduce custom
> > > > > > semantics for vfs provided infrastructure - especially when exposed to
> > > > > > userspace - that you please Cc us.
> > > > >
> > > > > I don't think it's just the future.  We really need to undo this ASAP.
> > > > 
> > > > Christian is not correct in stating that treatment of fd==0 as invalid
> > > > bpf object applies to vfs fd-s.
> > > > The path_fd addition in this patch is really the very first one of this kind.
> > > > At the same time bpf anon fd-s (progs, maps, links, btfs) with fd == 0
> > > > are invalid and this is not going to change. It's been uapi for a long time.
> > > > 
> > > > More so fd-s 0,1,2 are not "normal FDs".
> > > > Unix has made two mistakes:
> > > > 1. fd==0 being valid fd
> > > > 2. establishing convention that fd-s 0,1,2 are stdin, stdout, stderr.
> > > > 
> > > > The first mistake makes it hard to pass FD without an extra flag.
> > > > The 2nd mistake is just awful.
> > > > We've seen plenty of severe datacenter wide issues because some
> > > > library or piece of software assumes stdin/out/err.
> > > > Various services have been hurt badly by this "convention".
> > > > In libbpf we added ensure_good_fd() to make sure none of bpf objects
> > > > (progs, maps, etc) are ever seen with fd=0,1,2.
> > > > Other pieces of datacenter software enforce the same.
> > > > 
> > > > In other words fds=0,1,2 are taken. They must not be anything but
> > > > stdin/out/err or gutted to /dev/null.
> > > > Otherwise expect horrible bugs and multi day debugging.
> > > > 
> > > > Because of that, several years ago, we've decided to fix unix mistake #1
> > > > when it comes to bpf objects and started reserving fd=0 as invalid.
> > > > This patch is proposing to do the same for path_fd (normal vfs fd) when
> > > 
> > > It isn't as you now realized but I'm glad we cleared that up off-list.
> > > 
> > > > it is passed to bpf syscall. I think it's a good trade-off and fits
> > > > the rest of bpf uapi.
> > > > 
> > > > Everyone who's hiding behind statements: but POSIX is a standard..
> > > > or this is how we've been doing things... are ignoring the practical
> > > > situation at hand. fd-s 0,1,2 are taken. Make sure your sw never produces them.
> > > 
> > > (Prefix: Imagine me calmly writing this and in a relaxed tone.)
> > > 
> > > Just to clarify. I do think that deciding that 0 is an invalid file
> 
> descriptor
> 
> > 
> > We're still talking past each other.
> > 0 is an invalid bpf object. Not file.
> > There is a difference.
> 
> You cut of a word above. I can't follow your argument.
> File descriptor numbers are free to refer to whatever we want.
> They don't care about what type of object they refer to and they
> better not.

User space cares what they refer to.
Unix made integer FD into broken abstraction.
regular files need one set of syscalls to work with such 'integer FDs'.
sockets need another set of syscalls.
All other anon-inodes need another set of syscalls.
While user space sees an integer without type and that a root cause of the bugs.
And that's particularly bad for integers 0,1,2 where user space assumes that
regular files are behind them.

Imagine C++ project where base class is called 'FD' while children
implement their own access methods that don't overlap with each other.
Now one application passes this 'FD' class to another.
That other app can only do trial and error to figure out what it got.
Any user space developer would reject such class hierarchy design,
but kernel folks are so proud of this broken abstraction.
FDs are not special.. POSIX is the standard... Right.
That's why user space keeps their workarounds, because kernel folks
are not empathic to user space needs.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux