I've done another round of review of users. 1) There are two leaks I'd missed earlier; missing fdput() on a failure exit in arch/powerpc/kvm/powerpc.c and in drivers/media/rc/lirc_dev.c. I'll throw fixes into #fixes and send a pull request; that part is obvious. 2) Almost all values come from fdget(), fdget_raw() or fdget_pos(). Exceptions: several places that initialize to empty and a couple of overlayfs functions that construct the sucker manually. The latter can bloody well generate bogus values on error. fdget_pos() pairs with fdput_pos(), everything else pairs with fdput(). 3) fdput(empty) should be a no-op; we have enough places relying upon that and it makes sense in general. Currently it is true, but compiler has no way to figure that out. 4) emptiness check is done by absolute majority of users as the first thing after initialization. The way it is implemented is "check if fd.file is NULL"; currently we almost always skip fdput() if that condition holds. There are exceptions where we have fdput() called (kernel_read_file_from_fd(), ksys_sync_file_range(), do_utimes_fd(), ksys_readahead(), f_dupfd_query(), snd_pcm_link(), xfs_find_handle(), finit_module(2)) regardless of emptiness. That may result in excessive check (we look at FDPUT_FPUT in fd.flags, which is never going to be set when fd.file is NULL). Said that, there's a worse (still minor) problem with code generation - most of the places checking for emptiness do not bother with unlikely(). IMO that's a convincing argument in favour of explicit fd_empty() or whatever we end up calling it. 5) the most common (by far) pattern is f = fdget(n); if (!f.file) bugger off stuff using f.file fdput(f); _usually_ the failure happens with -EBADF, but there's a weirdness galore there - EINVAL, ENOENT, ENXIO, EBADFD (almost certainly misspelled EBADF), etc. There are several groups of unusual cases: * as mentioned above, some do fdput() on all exits, including the "fd is empty" one. Otherwise identical to the common variant. * two CLASS(fd{,_raw}) users. Equivalent to the above. * a few places where we have struct fd initialized empty (NULL, 0), with fdget() done in some cases, followed by unconditional fdput(). * ipc/mqueue.c:do_mq_notify(); struct fd reused - fdget() after fdput(). Usual otherwise. * net/socket.c; sockfd_lookup_light(). What happens there is that struct fd instance is created in frame of sockfd_lookup_light(), dropped if file is not a socket, then socket and fd.flags gets passed to caller, with fput_light(sock->file, need_fput) done on subsequent exits in caller. Minimal patch would be to lift the struct fd into the caller, along with dropping it on non-sockets and replace fput_light() calls with fdput(). That would convert them to usual pattern; I don't believe that it would make things any slower than they are now. * fs/splice.c:vmsplice_type(). One caller, called after fdget() without a prior emptiness check, consumes struct fd on failure. IMO the call of fdput() ought to be lifted into the caller; perhaps the entire thing should be expanded there. * fs/timerfd.c: timerfd_fget(): fdget() + check that it has the right ->f_op then fdput() or move to caller's struct fd (passed by reference). Caller's instance is left uninitialized on error. Only two callers. My preference would be to expand both, TBH - then it would be the usual pattern. * kernel/events/core.c:perf_fget_light(). Similar to timerfd case. * BPF: messy. __bpf_map_get(f) is called right after fdget(), without an emptiness check. Verifies that fd is not empty, that file is ours (fdput() if not), then returns file->private_data. Note that proof of correctness requires showing that ->private_data is never ERR_PTR() for those - otherwise the callers would leak. IMO fdput() ought to be lifted into the callers, probably along with the emptiness check. All subsequent exits in the callers are calling fdput() anyway... ____bpf_prog_get() is similar, but has only one caller; I'd simply expand it. * overlayfs. ovl_real_fdget() has rather unpleasant calling conventions; it gets struct fd *real and returns an int; it builds struct fd in *real and return 0 or -E...; the value left in *real may be invalid unless 0 has been returned. And it's not just left uninitialized - bogus values are explicitly stored there. ovl_real_fdget_meta() is similar. Callers use it to initialize their struct fd instance and return immediately if an error has been returned. All subsequent exits call fdput(). That's it. 6) The things clean up considerably if we use CLASS(fd{,_raw}) on those suckers. I've done mockup patches for that, and the end result is a lot nicer than what we have right now. The main issue with doing such conversions is that failure exit on emptiness grows an out-of-line check for FDPUT_FPUT and (never taken) call of fput(). OTOH, the win from having that failure exit moved out of line offsets that and it _is_ a slow path. 7) Avoiding that stuff on EBADF exits is interesting. It is possible to do without changing struct fd representation, but it's somewhat iffy. IME the following bit of ugliness static always_inline void fdput(struct fd f) { if (f.flags & FDPUT_FPUT && f.file) fput(f.file); } is enough for good code generation - I don's see any needless checks for f.file in the resulting assembler, at least not with gcc 12 builds on x86. gcc does manage to keep track of having seen a local struct member NULL or non-NULL. Alternative would be to turn struct fd into a struct-wrapped unsigned long and either use a flag for emptiness checks (we can steal more from the pointer) or just compare with zero for empitness check. The former might allow to represent specific errors, which would give a neat solution for ovl_real_fd() calling conventions - instead of "return an error, fill user-supplied struct fd" it could just return an error-representing struct fd and be done with that. Unfortunately, gcc optimizer really stinks when it comes to bitops - it can keep track of 'this field of struct has been found non-NULL' easily, but 'this bit had been found set' is lost very easily. So clever encodings are going to be brittle as far as code generation is concerned. One lovely example (already posted in one of the old threads) is this: if (v & 1) if (!(v & 1) && !(v & 2)) foo(); is not recognized as a no-op. Reason: gcc figures out that condition in second if can be rewritten as !(v & 3) and after it does that rewrite it can't figure out that if bit 0 is set, we can't have both bit 0 and bit 1 clear. clang manages that, gcc 12 does not. It is possible to work around - if (v & 1) if (!(v & 1)) if (!(v & 2)) foo(); is recognized to be a no-op. But that kind of crap is too brittle - minor changes in optimizer can screw it over. That example is very close to what we'd need if fd_empty() turned into checking a flag. So unless somebody has a clever scheme that would avoid that kind of fun, this is probably no-go. If we give up on representing errors, we can use 0 for empty. That means the following sequence of patches: * introduce fd_empty() and fd_file(). * convert f.file accesses into fd_empty(f) (in boolean contexts) and fd_file(f) (everything else). That would have to be done as a series of commits - doing a single-commit variant would guarantee fuckloads of conflicts through the cycle, all with the same commit. And it's not entirely mechanical either. * switch representation. * follow with CLASS(fd, f) series of cleanups. Doable, but it would be a long series. Alternative would be to do the minimal fdput() modification (as above) plus definitions of fd_empty() and fd_file(), followed by combinations of "switch to fd_file/fd_empty, use CLASS(fd) where it makes sense" patches from the first variant and finally switch the representation. The series is obviously shorter that way... Comments?