Re: [GIT PULL] pidfd updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 25, 2023 at 02:54:29PM +0100, Al Viro wrote:
> On Tue, Apr 25, 2023 at 02:34:15PM +0200, Christian Brauner wrote:
> 
> > It is rife with misunderstandings just looking at what we did in
> > kernel/fork.c earlier:
> > 
> > 	retval = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
> >         [...]
> >         pidfile = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
> >                                      O_RDWR | O_CLOEXEC);
> > 
> > seeing where both get_unused_fd_flags() and both *_getfile() take flag
> > arguments. Sure, for us this is pretty straightforward since we've seen
> > that code a million times. For others it's confusing why there's two
> > flag arguments. Sure, we could use one flags argument but it still be
> > weird to look at.
> 
> First of all, get_unused_fd_flags() doesn't give a damn about O_RDWR and
> anon_inode_getfile() - about O_CLOEXEC.  Duplicating the expression in
> places like that is cargo-culting.

I distinctly remember us having that conversation about how to do this
nicely back then and fwiw this is your patch... ;)
6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups")

So sure, that was my point: people are confused why there's two flag
arguments and what exactly has to go into them and just copy-paste
whatever other users already have.

>  
> > But with this api we also force all users to remember that they need to
> > cleanup the fd and the file - but definitely one - depending on where
> > they fail.
> > 
> > But conceptually the fd and the file belong together. After all it's the
> > file we're about to make that reserved fd refer to.
> 
> Why?  The common pattern is
> 	* reserve the descriptor
> 	* do some work that might fail
> 	* get struct file set up (which also might fail)
> 	* do more work (possibly including copyout, etc.)
> 	* install file into descriptor table
> 
> We want to reserve the descriptor early, since it's about the easiest
> thing to undo.  Why bother doing it next to file setup?  That can be
> pretty deep in call chain and doing it that way comes with headache
> about passing the damn thing around.  And cleanup rules with your
> variant end up being more complicated.
> 
> Keep the manipulations of descriptor table as close to the surface
> as possible.  The real work is with file references; descriptors
> belong in userland and passing them around kernel-side is asking
> for headache.

Imho, there's different use-cases.

There's definitely one where people allocate a file descriptor early on
and then sometimes maybe way later allocate a struct file and install
it. And that's where exposing and using get_unused_fd_flags() and
fd_install() is great and works fine.

But there's also users that do the reserve an fd and allocate a file at
the same time or receive both at the same time as the seccomp notifier,
scm creds, or pidfds. The receive_fd* stuff is basically a version of
the sketch I did without the ability to simply use a struct and not
having to open-code everything multiple times.

I never expected excitement around this as it was difficult enough to
get that receive_fd* thing going so I'm not arguing for this to be the
ultima ratio...



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux