Re: fd == 0 means AT_FDCWD BPF_OBJ_GET commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 18, 2023 at 11:57:14AM -0700, Linus Torvalds wrote:
> That is nobody's fault but your own, and you should just admit it rather
> than trying to double down on being wrong.

You're correct. I was indeed doubling down on that.
Thanks for putting it straight like that.

> The 0/1/2 file descriptors are not at all special. They are a shell
> pipeline default, nothing more. They are not the argument your think they
> are, and you should stop trying to make them an argument.

I'm well aware that any file type is allowed to be in FDs 0,1,2 and
some user space is using it that way, like old inetd:
https://github.com/guillemj/inetutils/blob/master/src/inetd.c#L428
That puts the same socket into 0,1,2 before exec-ing new process.

My point that the kernel has to assist user space instead of
stubbornly sticking to POSIX and saying all FDs are equal.

Most user space developers know that care should be taken with FDs 0,1,2,
but it's still easy to make a mistake.

To explain the motivation a bit of background:
"folly" is a core C++ library for fb apps. Like libstdc++ and a lot more.
Until this commit in 2021:
https://github.com/facebook/folly/commit/cc9032a0e41a0cba9aa93240c483cfceb0ff44ea
the user could launch a new process with flag "folly::Subprocess::CLOSE".
It's useful for the cases when child doesn't want to inherit stdin/out/err.
There is also GLOG. google's logging library that can be configured to log to stderr.
Both libraries are well written with the high code quality.
In a big app multiple people use different pieces and may not be aware
how all pieces are put together. You can guess the rest...
Important service used a library that used another library that started a
process with folly::Subprocess::CLOSE. That process opened network connections
and used glog. It was "working" for some time, because sys_write() to a socket
is a valid command, but when TCP buffers got full synchronous innocuous logging
prevented parent from making progress.

That footgun was removed from folly in 2021, but we still see this issue from time to time.
My point that the kernel can help here.
Since folks don't like sysctl to control FD assignment how about something like this:

diff --git a/fs/file.c b/fs/file.c
index 7893ea161d77..896e79433f61 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -554,9 +554,15 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
        return error;
 }

+__weak noinline u32 get_start_fd(void)
+{
+       return 0;
+}
+/* mark it as BPF_MODIFY_RETURN to let bpf progs adjust return value */
+
 int __get_unused_fd_flags(unsigned flags, unsigned long nofile)
 {
-       return alloc_fd(0, nofile, flags);
+       return alloc_fd(get_start_fd(), nofile, flags);
 }

Then we can enforce fd >= 3 for a certain container or for a particular app.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux