On Mon, Jul 24, 2023 at 09:51:05AM -0700, Linus Torvalds wrote: > On Mon, 24 Jul 2023 at 09:36, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > There are magic rules with "total_refs == inflight_refs", and that > > total_refs thing is very much the file count, ie > > > > total_refs = file_count(u->sk.sk_socket->file); > > > > where we had some nasty bugs with files coming back to life. > > Ok, I don't think this is an issue here. It really is that "only > in-flight refs remaining" that is a special case, and even > pidfd_getfd() shouldn't be able to change that. > > But the magic code is all in fget_task(), and those need to be checked. > > You can see how proc does things properly: it does do "fget_task()", > but then it only uses it to copy the path part, and just does fput() > afterwards. > > The bpf code does something like that too, and seems ok (ie it gets > the file in order to copy data from it, not to install it). Aside of fget_task() use, it has this: rcu_read_lock(); for (;; curr_fd++) { struct file *f; f = task_lookup_next_fd_rcu(curr_task, &curr_fd); if (!f) break; if (!get_file_rcu(f)) continue; /* set info->fd */ info->fd = curr_fd; rcu_read_unlock(); return f; } curr_task is not cached current here - it can be an arbitrary thread. And what we do to the file reference we get here is ctx.meta = &meta; ctx.task = info->task; ctx.fd = info->fd; ctx.file = file; return bpf_iter_run_prog(prog, &ctx); I think it can't be used to shove it into any descriptor table, but then there's forming an SCM_RIGHTS datagram and sending it, etc. - it might be worth looking into.