On Tue, Apr 25, 2023 at 02:54:29PM +0100, Al Viro wrote: > On Tue, Apr 25, 2023 at 02:34:15PM +0200, Christian Brauner wrote: > > > It is rife with misunderstandings just looking at what we did in > > kernel/fork.c earlier: > > > > retval = get_unused_fd_flags(O_RDWR | O_CLOEXEC); > > [...] > > pidfile = anon_inode_getfile("[pidfd]", &pidfd_fops, pid, > > O_RDWR | O_CLOEXEC); > > > > seeing where both get_unused_fd_flags() and both *_getfile() take flag > > arguments. Sure, for us this is pretty straightforward since we've seen > > that code a million times. For others it's confusing why there's two > > flag arguments. Sure, we could use one flags argument but it still be > > weird to look at. > > First of all, get_unused_fd_flags() doesn't give a damn about O_RDWR and > anon_inode_getfile() - about O_CLOEXEC. Duplicating the expression in > places like that is cargo-culting. I distinctly remember us having that conversation about how to do this nicely back then and fwiw this is your patch... ;) 6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups") So sure, that was my point: people are confused why there's two flag arguments and what exactly has to go into them and just copy-paste whatever other users already have. > > > But with this api we also force all users to remember that they need to > > cleanup the fd and the file - but definitely one - depending on where > > they fail. > > > > But conceptually the fd and the file belong together. After all it's the > > file we're about to make that reserved fd refer to. > > Why? The common pattern is > * reserve the descriptor > * do some work that might fail > * get struct file set up (which also might fail) > * do more work (possibly including copyout, etc.) > * install file into descriptor table > > We want to reserve the descriptor early, since it's about the easiest > thing to undo. Why bother doing it next to file setup? That can be > pretty deep in call chain and doing it that way comes with headache > about passing the damn thing around. And cleanup rules with your > variant end up being more complicated. > > Keep the manipulations of descriptor table as close to the surface > as possible. The real work is with file references; descriptors > belong in userland and passing them around kernel-side is asking > for headache. Imho, there's different use-cases. There's definitely one where people allocate a file descriptor early on and then sometimes maybe way later allocate a struct file and install it. And that's where exposing and using get_unused_fd_flags() and fd_install() is great and works fine. But there's also users that do the reserve an fd and allocate a file at the same time or receive both at the same time as the seccomp notifier, scm creds, or pidfds. The receive_fd* stuff is basically a version of the sketch I did without the ability to simply use a struct and not having to open-code everything multiple times. I never expected excitement around this as it was difficult enough to get that receive_fd* thing going so I'm not arguing for this to be the ultima ratio...