On Fri, May 24, 2024 at 07:17:13PM +0000, Alice Ryhl wrote: > > You obviously are aware of this but I'm just spelling it out. Iirc, > > there will practically only ever be one light refcount per file. > > > > For a light refcount to be used we know that the file descriptor table > > isn't shared with any other task. So there are no threads that could > > concurrently access the file descriptor table. We also know that the > > file descriptor table cannot become shared while we're in system call > > context because the caller can't create new threads and they can't > > unshare the file descriptor table. > > > > So there's only one fdget() caller (Yes, they could call fdget() > > multiple times and then have to do fdput() multiple times but that's a > > level of weirdness that we don't need to worry about.). > > Hmm. Is it not the case that different processes with different file > descriptor tables could reference the same underlying `struct file` and > both use light refcounts to do so, as long as each fd table is not > shared? So there could be multiple light refcounts to the same `struct > file` at the same time on different threads. Relevant rules: * Each file pointer in any descriptor table contributes to refcount of file. * All assignments to task->files are done by the task itself or, during task creation, by its parent The latter happens before the task runs for the first time. The former is done with task_lock(current) held. * current->files is always stable. The object it points to is guaranteed to stay alive at least until you explicitly change current->files. * task->files is stable while you are holding task_lock(task). The object it points to is guaranteed to stay alive until you release task_lock(task). * task->files MAY be fetched (racily) without either of the above, but it should not be dereferenced - the memory may be freed and reused right after you've fetched the pointer. * descriptor tables are refcounted by table->count. * descriptor table is created with ->count equal to 1 and destroyed when its ->count reaches 0. * each task with task->files == table contributes to table->count. * before the task dies, its ->files becomes NULL (see exit_files()). * when task is born (see copy_process() and copy_files())) the parent is responsible for setting the value of task->files and making sure that refcounts are correct; that's the only case where one is allowed to acquire an extra reference to existing table (handling of clone(2) with COPY_FILES). * the only descriptor table one may modify is that pointed to by current->files. Any access to other threads' descriptor tables is read-only. * struct fd is fundamentally thread-local. It should never be passed around, put into shared data structures, etc. * if you have done fdget(N), the matching fdput() MUST be done before the caller modifies the Nth slot of its descriptor table, spawns children that would share the descriptor table. * fdget() MAY borrow a reference from caller's descriptor table. That can be done if current->files->count is equal to 1. In that case we can be certain that the file reference we fetched from our descriptor table will remain unchanged (and thus contributing to refcount of file) until fdput(). Indeed, + at the time of fdget() no other thread has task->files pointing to our table (otherwise ->count would be greater than 1). + our thread will remain the sole owner of descriptor table at least until fdput(). Indeed, the first additional thread with task->files pointing to our table would have to have been spawned by us and we are forbidden to do that (rules for fdget() use) + no other thread could modify our descriptor table (they would have to share it first). + we are allowed to modify our table, but we are forbidden to touch the slot we'd copied from (rules for fdget() use). In other words, if current->files->count is equal to 1 at fdget() time we can skip incrementing refcount. Matching fdput() would need to skip decrement, of course. Note that we must record that (borrowed vs. cloned) in struct fd - the condition cannot be rechecked at fdput() time, since the table that had been shared at fdget() time might no longer be shared by the time of fdput(). > And this does *not* apply to `fdget_pos`, which checks the refcount of > the `struct file` instead of the refcount of the fd table. False. fdget_pos() is identical to fdget() as far as file refcount handling goes. The part that is different is that grabbing ->f_pos_lock is sensitive to file refcount in some cases. This is orthogonal to "does this struct fd contribute to file refcount". Again, "light" references are tied to thread; they can only be created if we are guaranteed that descriptor table's slot they came from will remain unchanged for as long as the reference is used. And yes, there may be several light references to the same file - both in different processes that do not share descriptor table *and* in the same thread, if e.g. sendfile(in_fd, out_fd, ...) is called with in_fd == out_fd.