Re: fd == 0 means AT_FDCWD BPF_OBJ_GET commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 18, 2023 at 09:44:33PM -0700, Alexei Starovoitov wrote:
> On Thu, May 18, 2023 at 11:57:14AM -0700, Linus Torvalds wrote:
> > That is nobody's fault but your own, and you should just admit it rather
> > than trying to double down on being wrong.
> 
> You're correct. I was indeed doubling down on that.
> Thanks for putting it straight like that.
> 
> > The 0/1/2 file descriptors are not at all special. They are a shell
> > pipeline default, nothing more. They are not the argument your think they
> > are, and you should stop trying to make them an argument.
> 
> I'm well aware that any file type is allowed to be in FDs 0,1,2 and
> some user space is using it that way, like old inetd:
> https://github.com/guillemj/inetutils/blob/master/src/inetd.c#L428
> That puts the same socket into 0,1,2 before exec-ing new process.
> 
> My point that the kernel has to assist user space instead of
> stubbornly sticking to POSIX and saying all FDs are equal.
> 
> Most user space developers know that care should be taken with FDs 0,1,2,
> but it's still easy to make a mistake.
> 
> To explain the motivation a bit of background:
> "folly" is a core C++ library for fb apps. Like libstdc++ and a lot more.
> Until this commit in 2021:
> https://github.com/facebook/folly/commit/cc9032a0e41a0cba9aa93240c483cfceb0ff44ea
> the user could launch a new process with flag "folly::Subprocess::CLOSE".
> It's useful for the cases when child doesn't want to inherit stdin/out/err.
> There is also GLOG. google's logging library that can be configured to log to stderr.
> Both libraries are well written with the high code quality.
> In a big app multiple people use different pieces and may not be aware
> how all pieces are put together. You can guess the rest...
> Important service used a library that used another library that started a
> process with folly::Subprocess::CLOSE. That process opened network connections
> and used glog. It was "working" for some time, because sys_write() to a socket
> is a valid command, but when TCP buffers got full synchronous innocuous logging
> prevented parent from making progress.
> 
> That footgun was removed from folly in 2021, but we still see this issue from time to time.
> My point that the kernel can help here.
> Since folks don't like sysctl to control FD assignment how about something like this:
> 
> diff --git a/fs/file.c b/fs/file.c
> index 7893ea161d77..896e79433f61 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -554,9 +554,15 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
>         return error;
>  }
> 
> +__weak noinline u32 get_start_fd(void)
> +{
> +       return 0;
> +}
> +/* mark it as BPF_MODIFY_RETURN to let bpf progs adjust return value */
> +
>  int __get_unused_fd_flags(unsigned flags, unsigned long nofile)
>  {
> -       return alloc_fd(0, nofile, flags);
> +       return alloc_fd(get_start_fd(), nofile, flags);

I'm sorry but I really don't think this is a good idea. We're not going
to run BPF programs in core file code. That stuff is sensitive and
complex enough as it is without having to take into account that a bpf
program can modify behavior. It's also completely unclear whether that's
safe to do as this would allow to change fd allocation across the whole
kernel.

This idea that fd 0, 1, and 2 or any other fd deserve special treatment
by the kernel needs to die; and quickly at that.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux