On Sat, May 25, 2024 at 01:33:05AM +0100, Al Viro wrote: > FWIW, fdget()...fdput() form a scope. The file reference _in_ that > struct fd is just a normal file reference, period. > > You can pass it to a function as an argument, etc. You certainly can > clone it (with get_file()). > > The rules are basically "you can't spawn threads with CLONE_FILES inside > the scope and you can't remove reference in your descriptor table while > in scope". The value in fd.file is guaranteed to stay with positive > refcount through the entire scope, just as if you had > > { > struct file *f = fget(n); > > if (!f) > return -EBADF; > > ... > > fput(f); > } > > The rules for access are exactly the same - you can pass f to a function > called from the scope, you can use it while in scope, you can clone it > and store it somewhere, etc. If anything, fd = fdget(N) is "clone or borrow the file reference from the Nth slot of your descriptor table into fd.file and record whether it had been cloned or borrowed into fd.flags". The rules for what can't be done in the scope of fd are there to guarantee that the reference _can_ be borrowed in the common case when descriptor table is not shared. The tricks with calling conventions of __fdget() (it returns both the reference and flags in single unsigned long, with flags in the lowest bits) are implementation details; those should stay hidden from anyone who uses struct fd. Incidentally, there's no "light refcount" - "light references" are simply the ones that had been borrowed from the descriptor table rather than having them cloned. One more thing: we never get fd.file == NULL && fd.flags != 0 - that combination is never generated (NULL couldn't have been cloned). As the result, if fd.file is NULL, fdput(fd) is a no-op. Most of the places where we use struct fd are making use of that - fd = fdget(n); if (fd.file) { do something fdput(fd); } is equivalent to fd = fdget(n); if (fd.file) do something fdput(fd); and the former is more common way to spell it. In particular, fd = fdget(n); if (!fd.file) return -EBADF; error = do_something(fd.file); fdput(fd); is often convenient and very common. We could go for CLASS(fd, fd)(n); if (!fd.file) return -EBADF; return do_something(fd.file); and let the compiler add fdput(fd) whenever it goes out of scope, but that gets clumsy - we'd end up with a plenty of declarations in the middle of blocks that way, and for C that looks wrong. There are very few places where struct fd does not come from fdget()/fdget_raw()/fdget_pos(). One variety is "initialize it to NULL,0, then possibly replace with fdget() result; unconditional fdput() will be safe either way" (a couple of places). Another is overlayfs ovl_real_fdget() and ovl_real_fdget_meta(). Those two are arguably too clever for their readers' good. It's still "borrow or clone, and remember which one had it been", but that's mixed with returning errors: err = ovl_read_fdget(file, &fd); may construct and store a junk value in fd if err is non-zero. Not a bug, but only because all callers remember to ignore that value in such case. Another inconvenient bit is this: static int vmsplice_type(struct fd f, int *type) { if (!f.file) return -EBADF; if (f.file->f_mode & FMODE_WRITE) { *type = ITER_SOURCE; } else if (f.file->f_mode & FMODE_READ) { *type = ITER_DEST; } else { fdput(f); return -EBADF; } return 0; } It's a move if an error had been returned and borrow otherwise. There's a couple of other examples of the same sort. It might be better to lift fdput() into the callers (or, in this case, just fold the entire sucker into its sole caller). Finally, there's a non-obvious thing in net/socket.c - sockfd_lookup_light()..fput_light(). What happens is that if sock_from_file(f) is non-NULL, we are guaranteed that sock_from_file(f)->file == f (and that reference does not contribute to refcount of f). So we do the same clone-or-borrow thing, but we only keep the socket for the rest of operation; when it comes to undoing the clone-or-borrow, we do that manually and use sock->file to reconstruct the file pointer. It's still an equivalent of fdget()/fdput() pair, but it slightly reduces a register pressure in some very hot paths; hell knows if it's warranted these days. Avoiding an unconditional clone in there is a really important part, but not keeping the struct file reference in a local variable through the syscall... probably not so much. It is very localized - nothing of that sort is done outside of net/socket.c (we probably ought to move fput_light() over there - no other users). That's about it as far as struct fd is concerned...