So this was a case of "too much explanations make the explanation much harder to follow". I tend to enjoy your pull request explanations, but this one was just *way* too much. Please try to make the point you are making a bit more salient, so that it's a lot easier to follow. On Mon, 24 Jul 2023 at 08:01, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > [..] the > file_count(file) greater than one optimization was already broken and > that concurrent read/write/getdents/seek calls are possible in the > regular system call api. > > The pidfd_getfd() system call allows a caller with ptrace_may_access() > abilities on another process to steal a file descriptor from this > process. I think the above is all you need to actually explain the problem and boil down the cause of the bug, and it means that the reader doesn't have to wade through a lot of other verbiage to figure it out. > if (file && (file->f_mode & FMODE_ATOMIC_POS)) { > - if (file_count(file) > 1) { > - v |= FDPUT_POS_UNLOCK; > - mutex_lock(&file->f_pos_lock); > - } > + v |= FDPUT_POS_UNLOCK; > + mutex_lock(&file->f_pos_lock); > } Ho humm. The patch is obviously correct. At the same time this is actually very annoying, because I played this very issue with the plain /proc/<pid>/fd/<xyz> interface long long ago, where it would just re-use the 'struct file' directly, and it was such a sh*t-show that I know it's much better to actually open a new file descriptor. I'm not sure that "share actual 'struct file' ever was part of a mainline kernel". I remember having it, but it was a "last century" kind of thing. The /proc interface hack was actually somewhat useful exactly because you'd see the file position change, but it really caused problems. The fact that pidfd_getfd() re-introduced that garbage and I never realized this just annoys me no end. And sadly, the man-page makes it very explicit that it's this broken kind of "share the whole file, file offset and all". Damn damn damn. Is it too late to just fix pidfd_getfd() to duplicate the 'struct file', and act like a new open, and act like /proc/<pid>/fd/<xyz>? Because honestly, having been there before, I'm pretty convinced that the real bug here is pidfd_getfd. I wonder if we could make pidfd_getfd() at least duplicate the struct file for directories. Those are the things that absolutely *require* atomic file positions. Argh. I wonder if this also screws up our garbage collection logic. It too ends up having some requirements for a reliable file_count(). Linus