On Mon, Jul 24, 2023 at 10:34:31AM -0700, Linus Torvalds wrote: > On Mon, 24 Jul 2023 at 10:23, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > This means pidfd_getfd() needs the same treatment as MSG_PEEK for sockets. > > So the reason I think pidfd_getfd() is ok is that it has serialized > with the source of the file descriptor and uses fget_task() -> > __fget_files. > > And that code is nasty and complicated, but it does get_file_rcu() to > increment the file count, and then *after* that it still checks that > yes, the file pointer is still there. > > And that means that anybody who uses fget_task() will only ever get a > ref to a file if that file still existed in the source, and you can > never have a situation where a file comes back to life. > > The reason MSG_PEEK is special is exactly because it can "resurrect" a > file that was closed, and added to the unix SCM garbage collection > list as "only has a ref in the SCM thing", so when we then make it > live again, it needs that very very subtle thing. Oh, eew. > > So pidfd_getfd() is ok in this regard. > > But it *is* an example of how subtle it is to just get a new ref to an > existing file. > > That whole > > if (atomic_read_acquire(&files->count) == 1) { > > in __fget_light() is also worth being aware of. It isn't about the > file count, but it is about "I have exclusive access to the file > table". So you can *not* close a file, or open a file, for another > process from outside. The only thread that is allowed to access or > change the file table (including resizing it), is the thread itself. Yeah, I'm well aware of that because an earlier version of the getdents patchset would've run into exactly that problem because of async offload on the file while the process that registered that async offload would've been free to create another thread and issue additional requests. > > I really hope we don't have cases of that. I don't think we do but it's something to keep in mind with async io interfaces where the caller is free to create other threads after having registered a request. Depending on how file references are done things can get tricky easily. I'm not complaining here but it is worth noting that additions to e.g., io_uring now have to reason through two reference counting mechanisms: the regular system call rules and the fixed file io_uring rules. If conditional locking is involved in one part it might have effects/consequences in the other part as one can mix regular system call requests and fixed file requests on the same struct file.