[PATCHES][RFC] rework of struct fd handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



	Experimental series trying to sanitize the handling
of struct fd.  Lightly tested, in serious need of review.

It's 6.10-rc1-based, lives in 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.fd
Individual patches in followups, descriptions below.

Shortlog:
Al Viro (19):
      powerpc: fix a file leak in kvm_vcpu_ioctl_enable_cap()
      lirc: rc_dev_get_from_fd(): fix file leak
      introduce fd_file(), convert all accessors to it.
      struct fd: representation change
      add struct fd constructors, get rid of __to_fd()
      net/socket.c: switch to CLASS(fd)
      introduce struct fderr, convert overlayfs uses to that
      fdget_raw() users: switch to CLASS(fd_raw, ...)
      css_set_fork(): switch to CLASS(fd_raw, ...)
      introduce "fd_pos" class
      switch simple users of fdget() to CLASS(fd, ...)
      bpf: switch to CLASS(fd, ...)
      convert vmsplice() to CLASS(fd, ...)
      finit_module(): convert to CLASS(fd, ...)
      timerfd: switch to CLASS(fd, ...)
      do_mq_notify(): switch to CLASS(fd, ...)
      simplify xfs_find_handle() a bit
      convert kernel/events/core.c
      deal with the last remaing boolean uses of fd_file()

Diffstat:
 arch/alpha/kernel/osf_sys.c                |   7 +-
 arch/arm/kernel/sys_oabi-compat.c          |  18 +-
 arch/powerpc/kvm/book3s_64_vio.c           |   9 +-
 arch/powerpc/kvm/powerpc.c                 |  26 +--
 arch/powerpc/platforms/cell/spu_syscalls.c |  17 +-
 arch/x86/kernel/cpu/sgx/main.c             |  10 +-
 arch/x86/kvm/svm/sev.c                     |  43 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |  27 +--
 drivers/gpu/drm/drm_syncobj.c              |  11 +-
 drivers/infiniband/core/ucma.c             |  21 +-
 drivers/infiniband/core/uverbs_cmd.c       |  12 +-
 drivers/media/mc/mc-request.c              |  22 +-
 drivers/media/rc/lirc_dev.c                |  13 +-
 drivers/vfio/group.c                       |  10 +-
 drivers/vfio/virqfd.c                      |  20 +-
 drivers/virt/acrn/irqfd.c                  |  14 +-
 drivers/xen/privcmd.c                      |  35 ++--
 fs/btrfs/ioctl.c                           |   7 +-
 fs/coda/inode.c                            |  13 +-
 fs/eventfd.c                               |   9 +-
 fs/eventpoll.c                             |  62 ++----
 fs/ext4/ioctl.c                            |  23 +-
 fs/f2fs/file.c                             |  17 +-
 fs/fcntl.c                                 |  74 +++----
 fs/fhandle.c                               |   7 +-
 fs/file.c                                  |  26 +--
 fs/fsopen.c                                |  23 +-
 fs/fuse/dev.c                              |  10 +-
 fs/ioctl.c                                 |  47 ++---
 fs/kernel_read_file.c                      |  12 +-
 fs/locks.c                                 |  27 +--
 fs/namei.c                                 |  19 +-
 fs/namespace.c                             |  53 ++---
 fs/notify/fanotify/fanotify_user.c         |  50 ++---
 fs/notify/inotify/inotify_user.c           |  44 ++--
 fs/ocfs2/cluster/heartbeat.c               |  17 +-
 fs/open.c                                  |  67 +++---
 fs/overlayfs/file.c                        | 187 +++++++----------
 fs/quota/quota.c                           |  18 +-
 fs/read_write.c                            | 227 +++++++++-----------
 fs/readdir.c                               |  38 ++--
 fs/remap_range.c                           |  11 +-
 fs/select.c                                |  17 +-
 fs/signalfd.c                              |  11 +-
 fs/smb/client/ioctl.c                      |  17 +-
 fs/splice.c                                |  82 +++-----
 fs/stat.c                                  |  10 +-
 fs/statfs.c                                |  12 +-
 fs/sync.c                                  |  33 ++-
 fs/timerfd.c                               |  42 ++--
 fs/utimes.c                                |  11 +-
 fs/xattr.c                                 |  64 +++---
 fs/xfs/xfs_exchrange.c                     |  12 +-
 fs/xfs/xfs_handle.c                        |  16 +-
 fs/xfs/xfs_ioctl.c                         |  85 +++-----
 include/linux/bpf.h                        |   9 +-
 include/linux/cleanup.h                    |   2 +-
 include/linux/file.h                       |  89 +++++---
 io_uring/sqpoll.c                          |  31 +--
 ipc/mqueue.c                               | 126 +++++------
 kernel/bpf/bpf_inode_storage.c             |  29 +--
 kernel/bpf/btf.c                           |  13 +-
 kernel/bpf/map_in_map.c                    |  37 +---
 kernel/bpf/syscall.c                       | 197 ++++++-----------
 kernel/bpf/token.c                         |  19 +-
 kernel/bpf/verifier.c                      |  20 +-
 kernel/cgroup/cgroup.c                     |  21 +-
 kernel/events/core.c                       |  67 +++---
 kernel/module/main.c                       |  15 +-
 kernel/nsproxy.c                           |  15 +-
 kernel/pid.c                               |  26 +--
 kernel/signal.c                            |  33 ++-
 kernel/sys.c                               |  21 +-
 kernel/taskstats.c                         |  20 +-
 kernel/watch_queue.c                       |   8 +-
 mm/fadvise.c                               |  10 +-
 mm/filemap.c                               |  19 +-
 mm/memcontrol.c                            |  37 ++--
 mm/readahead.c                             |  25 +--
 net/core/net_namespace.c                   |  14 +-
 net/core/sock_map.c                        |  23 +-
 net/socket.c                               | 325 +++++++++++++----------------
 security/integrity/ima/ima_main.c          |   9 +-
 security/landlock/syscalls.c               |  57 ++---
 security/loadpin/loadpin.c                 |  10 +-
 sound/core/pcm_native.c                    |   6 +-
 virt/kvm/eventfd.c                         |  19 +-
 virt/kvm/vfio.c                            |  18 +-
 88 files changed, 1234 insertions(+), 1951 deletions(-)

01/19) powerpc: fix a file leak in kvm_vcpu_ioctl_enable_cap()
02/19) lirc: rc_dev_get_from_fd(): fix file leak

	First two patches are obvious leak fixes - missing fdput()
on one a failure exits.

03/19) introduce fd_file(), convert all accessors to it.

	For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
	Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
	This commit converts (almost) all of f.file to
fd_file(f).  It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).

NB: this commit is where I'd expect arseloads of conflicts
through the cycle, simply because of the breadth of area being
touched.  The biggest one, as well (500 lines modified).
Might be worth splitting - not sure.

04/19) struct fd: representation change

	The absolute majority of instances comes from fdget() and its
relatives; the underlying primitives actually return a struct file
reference and a couple of flags encoded into an unsigned long - the lower
two bits of file address are always zero, so we can stash the flags
into those.  On the way out we use __to_fd() to unpack that unsigned
long into struct fd.
	Let's use that representation for struct fd itself - make it
a structure with a single unsigned long member (.word), with the value
equal either to (unsigned long)p | flags, p being an address of some
struct file instance, or to 0 for an empty fd.
	Note that we never used a struct fd instance with NULL ->file
and non-zero ->flags; the emptiness had been checked as (!f.file) and
we expected e.g. fdput(empty) to be a no-op.  With new representation
we can use (!f.word) for emptiness check; that is enough for compiler
to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f)
will be a no-op in such case.
	For now the new predicate (fd_empty(f)) has no users; all the
existing checks have form (!fd_file(f)).  We will convert to fd_empty()
use later; here we only define it (and tell the compiler that it's
unlikely to return true).
	This commit only deals with representation change; there will
be followups.
	NOTE: overlayfs part is _not_ in the final form - it will be
massaged shortly.

05/19) add struct fd constructors, get rid of __to_fd()

	Make __fdget() et.al. return struct fd directly.
New helpers: BORROWED_FD(file) and CLONED_FD(file), for
borrowed and cloned file references resp.
	NOTE: this might need tuning; in particular, inline on
__fget_light() is there to keep the code generation same as
before - we probably want to keep it inlined in fdget() et.al.
(especially so in fdget_pos()), but that needs profiling.


Next two commits deal with the worst irregularities in struct fd use:
in net/socket.c we have fdget() without matching fdput() - fdget() is
done in sockfd_lookup_light(), then the results are passed (in modified
form) to caller, which deals with conditional fput().  And in
overlayfs we have an almost-but-not-quite struct fd shoehorned into
struct fd, with ugly calling conventions as the result of that.
I'm not sure what order would be better for these two commits.

06/19) net/socket.c: switch to CLASS(fd)

	I strongly suspect that important part in sockfd_lookup_light() is
avoiding needless file refcount operations, not the marginal reduction of
the register pressure from not keeping a struct file pointer in the caller.
	If that's true, we should get the same benefits from straight
fdget()/fdput().  And AFAICS with sane use of CLASS(fd) we can get a better
code generation...
	Would be nice if somebody tested it on networking test suites
(including benchmarks)...

	sockfd_lookup_light() does fdget(), uses sock_from_file() to
get the associated socket and returns the struct socket reference to
the caller, along with "do we need to fput()" flag.  No matching fdput(),
the caller does its equivalent manually, using the fact that sock->file
points to the struct file the socket has come from.
	Get rid of that - have the callers do fdget()/fdput() and
use sock_from_file() directly.  That kills sockfd_lookup_light()
and fput_light() (no users left).
	What's more, we can get rid of explicit fdget()/fdput() by
switching to CLASS(fd, ...) - code generation does not suffer, since
now fdput() inserted on "descriptor is not opened" failure exit
is recognized to be a no-op by compiler.
	We could split that commit in two (getting rid of sockd_lookup_light()
and switch to CLASS(fd, ...)), but AFAICS it ends up being harder to read
that way.

07/19) introduce struct fderr, convert overlayfs uses to that

Similar to struct fd; unlike struct fd, it can represent
error values.
Accessors:
* fd_empty(f):	true if f represents an error
* fd_file(f):	just as for struct fd it yields a pointer to
		struct file if fd_empty(f) is false.  If
		fd_empty(f) is true, fd_file(f) is guaranteed
		_not_ to be an address of any object (IS_ERR()
		will be true in that case)
* fd_error(f):	if f represents an error, returns that error,
		otherwise the return value is junk.
Constructors:
* ERR_FD(-E...):	an instance encoding given error [ERR_FDERR, perhaps?]
* BORROWED_FDERR(file):	if file points to a struct file instance,
			return a struct fderr representing that file
			reference with no flags set.
			if file is an ERR_PTR(-E...), return a struct
			fderr representing that error.
			file MUST NOT be NULL.
* CLONED_FDERR(file):	similar, but in case when file points to
			a struct file instance, set FDPUT_FPUT in flags.
fdput_err() serves as a destructor.
See fs/overlayfs/file.c for example of use.

08/19) fdget_raw() users: switch to CLASS(fd_raw, ...)
	all convert trivially
09/19) css_set_fork(): switch to CLASS(fd_raw, ...)
	reference acquired there by fget_raw() is not stashed anywhere -
we could as well borrow instead.

10/19) introduce "fd_pos" class
	fdget_pos() for constructor, fdput_pos() for cleanup, users of
fd..._pos() converted.

11/19) switch simple users of fdget() to CLASS(fd, ...)
	low-hanging fruit; that's another likely source of conflicts
over the cycle and it might make a lot of sense to split; fortunately,
it can be split pretty much on per-function basis - chunks are independent
from each other.

12/19) bpf: switch to CLASS(fd, ...)
	Calling conventions for __bpf_map_get() would be more convenient
if it left fpdut() on failure to callers.  Makes for simpler logics
in the callers.
	Among other things, the proof of memory safety no longer has to
rely upon file->private_data never being ERR_PTR(...) for bpffs files.
Original calling conventions made it impossible for the caller to tell
whether __bpf_map_get() has returned ERR_PTR(-EINVAL) because it has found
the file not be a bpf map one (in which case it would've done fdput())
or because it found that ERR_PTR(-EINVAL) in file->private_data of a
bpf map file (in which case fdput() would _not_ have been done).
	With that calling conventions change it's easy to switch all
bpffs users to CLASS(fd, ...).

13/19) convert vmsplice() to CLASS(fd, ...)
	Irregularity here is fdput() not in the same scope as fdget();
we could just lift it out vmsplice_type() in vmsplice(2), but there's
no much point keeping vmsplice_type() separate after that...

14/19) finit_module(): convert to CLASS(fd, ...)
	Slightly unidiomatic emptiness check; just lift it out of
idempotent_init_module() and into finit_module(2) and that's it.

15/19) timerfd: switch to CLASS(fd, ...)
	Fold timerfd_fget() into both callers to have fdget() and fdput()
in the same scope.  Could be done in different ways, but this is probably
the smallest solution.

16/19) do_mq_notify(): switch to CLASS(fd, ...)
	a minor twist is the reuse of struct fd instance in there

17/19) simplify xfs_find_handle() a bit
	XFS_IOC_FD_TO_HANDLE can grab a reference to copied ->f_path and
let the file go; results in simpler control flow - cleanup is
the same for both "by descriptor" and "by pathname" cases.
	NOTE: grabbing f->f_path to pin file_inode(f) is valid, since
we are dealing with XFS files here - no reassignments of file_inode().

18/19) convert kernel/events/core.c
	a questionable trick in perf_event_open(2) - deliberate call of
fdget(-1), expecting it to yield empty

19/19) deal with the last remaing boolean uses of fd_file()
	... replacing them with uses of fd_empty()




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux