Re: [PATCHES][RFC] rework of struct fd handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 07, 2024 at 02:56:56AM +0100, Al Viro wrote:
> 	Experimental series trying to sanitize the handling
> of struct fd.  Lightly tested, in serious need of review.
> 
> It's 6.10-rc1-based, lives in 
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.fd
> Individual patches in followups, descriptions below.

Looks overall like a good approach to me and I really like that we're
not renaming struct fd.

(IMHO, you should just send the CLASS(fd_pos) cleanup helper addition
upstream for this cycle because it's the only thing we're missing. And
that'll likely make conversions in individual patches/per subsystem
easier as well.)

> 
> Shortlog:
> Al Viro (19):
>       powerpc: fix a file leak in kvm_vcpu_ioctl_enable_cap()
>       lirc: rc_dev_get_from_fd(): fix file leak
>       introduce fd_file(), convert all accessors to it.
>       struct fd: representation change
>       add struct fd constructors, get rid of __to_fd()
>       net/socket.c: switch to CLASS(fd)
>       introduce struct fderr, convert overlayfs uses to that
>       fdget_raw() users: switch to CLASS(fd_raw, ...)
>       css_set_fork(): switch to CLASS(fd_raw, ...)
>       introduce "fd_pos" class
>       switch simple users of fdget() to CLASS(fd, ...)
>       bpf: switch to CLASS(fd, ...)
>       convert vmsplice() to CLASS(fd, ...)
>       finit_module(): convert to CLASS(fd, ...)
>       timerfd: switch to CLASS(fd, ...)
>       do_mq_notify(): switch to CLASS(fd, ...)
>       simplify xfs_find_handle() a bit
>       convert kernel/events/core.c
>       deal with the last remaing boolean uses of fd_file()
> 
> Diffstat:
>  arch/alpha/kernel/osf_sys.c                |   7 +-
>  arch/arm/kernel/sys_oabi-compat.c          |  18 +-
>  arch/powerpc/kvm/book3s_64_vio.c           |   9 +-
>  arch/powerpc/kvm/powerpc.c                 |  26 +--
>  arch/powerpc/platforms/cell/spu_syscalls.c |  17 +-
>  arch/x86/kernel/cpu/sgx/main.c             |  10 +-
>  arch/x86/kvm/svm/sev.c                     |  43 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |  27 +--
>  drivers/gpu/drm/drm_syncobj.c              |  11 +-
>  drivers/infiniband/core/ucma.c             |  21 +-
>  drivers/infiniband/core/uverbs_cmd.c       |  12 +-
>  drivers/media/mc/mc-request.c              |  22 +-
>  drivers/media/rc/lirc_dev.c                |  13 +-
>  drivers/vfio/group.c                       |  10 +-
>  drivers/vfio/virqfd.c                      |  20 +-
>  drivers/virt/acrn/irqfd.c                  |  14 +-
>  drivers/xen/privcmd.c                      |  35 ++--
>  fs/btrfs/ioctl.c                           |   7 +-
>  fs/coda/inode.c                            |  13 +-
>  fs/eventfd.c                               |   9 +-
>  fs/eventpoll.c                             |  62 ++----
>  fs/ext4/ioctl.c                            |  23 +-
>  fs/f2fs/file.c                             |  17 +-
>  fs/fcntl.c                                 |  74 +++----
>  fs/fhandle.c                               |   7 +-
>  fs/file.c                                  |  26 +--
>  fs/fsopen.c                                |  23 +-
>  fs/fuse/dev.c                              |  10 +-
>  fs/ioctl.c                                 |  47 ++---
>  fs/kernel_read_file.c                      |  12 +-
>  fs/locks.c                                 |  27 +--
>  fs/namei.c                                 |  19 +-
>  fs/namespace.c                             |  53 ++---
>  fs/notify/fanotify/fanotify_user.c         |  50 ++---
>  fs/notify/inotify/inotify_user.c           |  44 ++--
>  fs/ocfs2/cluster/heartbeat.c               |  17 +-
>  fs/open.c                                  |  67 +++---
>  fs/overlayfs/file.c                        | 187 +++++++----------
>  fs/quota/quota.c                           |  18 +-
>  fs/read_write.c                            | 227 +++++++++-----------
>  fs/readdir.c                               |  38 ++--
>  fs/remap_range.c                           |  11 +-
>  fs/select.c                                |  17 +-
>  fs/signalfd.c                              |  11 +-
>  fs/smb/client/ioctl.c                      |  17 +-
>  fs/splice.c                                |  82 +++-----
>  fs/stat.c                                  |  10 +-
>  fs/statfs.c                                |  12 +-
>  fs/sync.c                                  |  33 ++-
>  fs/timerfd.c                               |  42 ++--
>  fs/utimes.c                                |  11 +-
>  fs/xattr.c                                 |  64 +++---
>  fs/xfs/xfs_exchrange.c                     |  12 +-
>  fs/xfs/xfs_handle.c                        |  16 +-
>  fs/xfs/xfs_ioctl.c                         |  85 +++-----
>  include/linux/bpf.h                        |   9 +-
>  include/linux/cleanup.h                    |   2 +-
>  include/linux/file.h                       |  89 +++++---
>  io_uring/sqpoll.c                          |  31 +--
>  ipc/mqueue.c                               | 126 +++++------
>  kernel/bpf/bpf_inode_storage.c             |  29 +--
>  kernel/bpf/btf.c                           |  13 +-
>  kernel/bpf/map_in_map.c                    |  37 +---
>  kernel/bpf/syscall.c                       | 197 ++++++-----------
>  kernel/bpf/token.c                         |  19 +-
>  kernel/bpf/verifier.c                      |  20 +-
>  kernel/cgroup/cgroup.c                     |  21 +-
>  kernel/events/core.c                       |  67 +++---
>  kernel/module/main.c                       |  15 +-
>  kernel/nsproxy.c                           |  15 +-
>  kernel/pid.c                               |  26 +--
>  kernel/signal.c                            |  33 ++-
>  kernel/sys.c                               |  21 +-
>  kernel/taskstats.c                         |  20 +-
>  kernel/watch_queue.c                       |   8 +-
>  mm/fadvise.c                               |  10 +-
>  mm/filemap.c                               |  19 +-
>  mm/memcontrol.c                            |  37 ++--
>  mm/readahead.c                             |  25 +--
>  net/core/net_namespace.c                   |  14 +-
>  net/core/sock_map.c                        |  23 +-
>  net/socket.c                               | 325 +++++++++++++----------------
>  security/integrity/ima/ima_main.c          |   9 +-
>  security/landlock/syscalls.c               |  57 ++---
>  security/loadpin/loadpin.c                 |  10 +-
>  sound/core/pcm_native.c                    |   6 +-
>  virt/kvm/eventfd.c                         |  19 +-
>  virt/kvm/vfio.c                            |  18 +-
>  88 files changed, 1234 insertions(+), 1951 deletions(-)
> 
> 01/19) powerpc: fix a file leak in kvm_vcpu_ioctl_enable_cap()
> 02/19) lirc: rc_dev_get_from_fd(): fix file leak
> 
> 	First two patches are obvious leak fixes - missing fdput()
> on one a failure exits.
> 
> 03/19) introduce fd_file(), convert all accessors to it.
> 
> 	For any changes of struct fd representation we need to
> turn existing accesses to fields into calls of wrappers.
> Accesses to struct fd::flags are very few (3 in linux/file.h,
> 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
> explicit initializers).
> 	Those can be dealt with in the commit converting to
> new layout; accesses to struct fd::file are too many for that.
> 	This commit converts (almost) all of f.file to
> fd_file(f).  It's not entirely mechanical ('file' is used as
> a member name more than just in struct fd) and it does not
> even attempt to distinguish the uses in pointer context from
> those in boolean context; the latter will be eventually turned
> into a separate helper (fd_empty()).
> 
> NB: this commit is where I'd expect arseloads of conflicts
> through the cycle, simply because of the breadth of area being
> touched.  The biggest one, as well (500 lines modified).
> Might be worth splitting - not sure.
> 
> 04/19) struct fd: representation change
> 
> 	The absolute majority of instances comes from fdget() and its
> relatives; the underlying primitives actually return a struct file
> reference and a couple of flags encoded into an unsigned long - the lower
> two bits of file address are always zero, so we can stash the flags
> into those.  On the way out we use __to_fd() to unpack that unsigned
> long into struct fd.
> 	Let's use that representation for struct fd itself - make it
> a structure with a single unsigned long member (.word), with the value
> equal either to (unsigned long)p | flags, p being an address of some
> struct file instance, or to 0 for an empty fd.
> 	Note that we never used a struct fd instance with NULL ->file
> and non-zero ->flags; the emptiness had been checked as (!f.file) and
> we expected e.g. fdput(empty) to be a no-op.  With new representation
> we can use (!f.word) for emptiness check; that is enough for compiler
> to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f)
> will be a no-op in such case.
> 	For now the new predicate (fd_empty(f)) has no users; all the
> existing checks have form (!fd_file(f)).  We will convert to fd_empty()
> use later; here we only define it (and tell the compiler that it's
> unlikely to return true).
> 	This commit only deals with representation change; there will
> be followups.
> 	NOTE: overlayfs part is _not_ in the final form - it will be
> massaged shortly.
> 
> 05/19) add struct fd constructors, get rid of __to_fd()
> 
> 	Make __fdget() et.al. return struct fd directly.
> New helpers: BORROWED_FD(file) and CLONED_FD(file), for
> borrowed and cloned file references resp.
> 	NOTE: this might need tuning; in particular, inline on
> __fget_light() is there to keep the code generation same as
> before - we probably want to keep it inlined in fdget() et.al.
> (especially so in fdget_pos()), but that needs profiling.
> 
> 
> Next two commits deal with the worst irregularities in struct fd use:
> in net/socket.c we have fdget() without matching fdput() - fdget() is
> done in sockfd_lookup_light(), then the results are passed (in modified
> form) to caller, which deals with conditional fput().  And in
> overlayfs we have an almost-but-not-quite struct fd shoehorned into
> struct fd, with ugly calling conventions as the result of that.
> I'm not sure what order would be better for these two commits.
> 
> 06/19) net/socket.c: switch to CLASS(fd)
> 
> 	I strongly suspect that important part in sockfd_lookup_light() is
> avoiding needless file refcount operations, not the marginal reduction of
> the register pressure from not keeping a struct file pointer in the caller.
> 	If that's true, we should get the same benefits from straight
> fdget()/fdput().  And AFAICS with sane use of CLASS(fd) we can get a better
> code generation...
> 	Would be nice if somebody tested it on networking test suites
> (including benchmarks)...
> 
> 	sockfd_lookup_light() does fdget(), uses sock_from_file() to
> get the associated socket and returns the struct socket reference to
> the caller, along with "do we need to fput()" flag.  No matching fdput(),
> the caller does its equivalent manually, using the fact that sock->file
> points to the struct file the socket has come from.
> 	Get rid of that - have the callers do fdget()/fdput() and
> use sock_from_file() directly.  That kills sockfd_lookup_light()
> and fput_light() (no users left).
> 	What's more, we can get rid of explicit fdget()/fdput() by
> switching to CLASS(fd, ...) - code generation does not suffer, since
> now fdput() inserted on "descriptor is not opened" failure exit
> is recognized to be a no-op by compiler.
> 	We could split that commit in two (getting rid of sockd_lookup_light()
> and switch to CLASS(fd, ...)), but AFAICS it ends up being harder to read
> that way.
> 
> 07/19) introduce struct fderr, convert overlayfs uses to that
> 
> Similar to struct fd; unlike struct fd, it can represent
> error values.
> Accessors:
> * fd_empty(f):	true if f represents an error
> * fd_file(f):	just as for struct fd it yields a pointer to
> 		struct file if fd_empty(f) is false.  If
> 		fd_empty(f) is true, fd_file(f) is guaranteed
> 		_not_ to be an address of any object (IS_ERR()
> 		will be true in that case)
> * fd_error(f):	if f represents an error, returns that error,
> 		otherwise the return value is junk.
> Constructors:
> * ERR_FD(-E...):	an instance encoding given error [ERR_FDERR, perhaps?]
> * BORROWED_FDERR(file):	if file points to a struct file instance,
> 			return a struct fderr representing that file
> 			reference with no flags set.
> 			if file is an ERR_PTR(-E...), return a struct
> 			fderr representing that error.
> 			file MUST NOT be NULL.
> * CLONED_FDERR(file):	similar, but in case when file points to
> 			a struct file instance, set FDPUT_FPUT in flags.
> fdput_err() serves as a destructor.
> See fs/overlayfs/file.c for example of use.
> 
> 08/19) fdget_raw() users: switch to CLASS(fd_raw, ...)
> 	all convert trivially
> 09/19) css_set_fork(): switch to CLASS(fd_raw, ...)
> 	reference acquired there by fget_raw() is not stashed anywhere -
> we could as well borrow instead.
> 
> 10/19) introduce "fd_pos" class
> 	fdget_pos() for constructor, fdput_pos() for cleanup, users of
> fd..._pos() converted.
> 
> 11/19) switch simple users of fdget() to CLASS(fd, ...)
> 	low-hanging fruit; that's another likely source of conflicts
> over the cycle and it might make a lot of sense to split; fortunately,
> it can be split pretty much on per-function basis - chunks are independent
> from each other.
> 
> 12/19) bpf: switch to CLASS(fd, ...)
> 	Calling conventions for __bpf_map_get() would be more convenient
> if it left fpdut() on failure to callers.  Makes for simpler logics
> in the callers.
> 	Among other things, the proof of memory safety no longer has to
> rely upon file->private_data never being ERR_PTR(...) for bpffs files.
> Original calling conventions made it impossible for the caller to tell
> whether __bpf_map_get() has returned ERR_PTR(-EINVAL) because it has found
> the file not be a bpf map one (in which case it would've done fdput())
> or because it found that ERR_PTR(-EINVAL) in file->private_data of a
> bpf map file (in which case fdput() would _not_ have been done).
> 	With that calling conventions change it's easy to switch all
> bpffs users to CLASS(fd, ...).
> 
> 13/19) convert vmsplice() to CLASS(fd, ...)
> 	Irregularity here is fdput() not in the same scope as fdget();
> we could just lift it out vmsplice_type() in vmsplice(2), but there's
> no much point keeping vmsplice_type() separate after that...
> 
> 14/19) finit_module(): convert to CLASS(fd, ...)
> 	Slightly unidiomatic emptiness check; just lift it out of
> idempotent_init_module() and into finit_module(2) and that's it.
> 
> 15/19) timerfd: switch to CLASS(fd, ...)
> 	Fold timerfd_fget() into both callers to have fdget() and fdput()
> in the same scope.  Could be done in different ways, but this is probably
> the smallest solution.
> 
> 16/19) do_mq_notify(): switch to CLASS(fd, ...)
> 	a minor twist is the reuse of struct fd instance in there
> 
> 17/19) simplify xfs_find_handle() a bit
> 	XFS_IOC_FD_TO_HANDLE can grab a reference to copied ->f_path and
> let the file go; results in simpler control flow - cleanup is
> the same for both "by descriptor" and "by pathname" cases.
> 	NOTE: grabbing f->f_path to pin file_inode(f) is valid, since
> we are dealing with XFS files here - no reassignments of file_inode().
> 
> 18/19) convert kernel/events/core.c
> 	a questionable trick in perf_event_open(2) - deliberate call of
> fdget(-1), expecting it to yield empty
> 
> 19/19) deal with the last remaing boolean uses of fd_file()
> 	... replacing them with uses of fd_empty()




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux