On Tue, 2024-08-13 at 14:30 +0200, Christian Brauner wrote: > We do embedd struct fown_struct into struct file letting it take up 32 > bytes in total. We could tweak struct fown_struct to be more compact but > really it shouldn't even be embedded in struct file in the first place. > > Instead, actual users of struct fown_struct should allocate the struct > on demand. This frees up 24 bytes in struct file. > > That will have some potentially user-visible changes for the ownership > fcntl()s. Some of them can now fail due to allocation failures. > Practically, that probably will almost never happen as the allocations > are small and they only happen once per file. > > The fown_struct is used during kill_fasync() which is used by e.g., > pipes to generate a SIGIO signal. Sending of such signals is conditional > on userspace having set an owner for the file using one of the F_OWNER > fcntl()s. Such users will be unaffected if struct fown_struct is > allocated during the fcntl() call. > > There are a few subsystems that call __f_setown() expecting > file->f_owner to be allocated: > > (1) tun devices > file->f_op->fasync::tun_chr_fasync() > -> __f_setown() > > There are no callers of tun_chr_fasync(). > > (2) tty devices > > file->f_op->fasync::tty_fasync() > -> __tty_fasync() > -> __f_setown() > > tty_fasync() has no additional callers but __tty_fasync() has. Note > that __tty_fasync() only calls __f_setown() if the @on argument is > true. It's called from: > > file->f_op->release::tty_release() > -> tty_release() > -> __tty_fasync() > -> __f_setown() > > tty_release() calls __tty_fasync() with @on false > => __f_setown() is never called from tty_release(). > => All callers of tty_release() are safe as well. > > file->f_op->release::tty_open() > -> tty_release() > -> __tty_fasync() > -> __f_setown() > > __tty_hangup() calls __tty_fasync() with @on false > => __f_setown() is never called from tty_release(). > => All callers of __tty_hangup() are safe as well. > > From the callchains it's obvious that (1) and (2) end up getting called > via file->f_op->fasync(). That can happen either through the F_SETFL > fcntl() with the FASYNC flag raised or via the FIOASYNC ioctl(). If > FASYNC is requested and the file isn't already FASYNC then > file->f_op->fasync() is called with @on true which ends up causing both > (1) and (2) to call __f_setown(). > > (1) and (2) are the only subsystems that call __f_setown() from the > file->f_op->fasync() handler. So both (1) and (2) have been updated to > allocate a struct fown_struct prior to calling fasync_helper() to > register with the fasync infrastructure. That's safe as they both call > fasync_helper() which also does allocations if @on is true. > > The other interesting case are file leases: > > (3) file leases > lease_manager_ops->lm_setup::lease_setup() > -> __f_setown() > > Which in turn is called from: > > generic_add_lease() > -> lease_manager_ops->lm_setup::lease_setup() > -> __f_setown() > > So here again we can simply make generic_add_lease() allocate struct > fown_struct prior to the lease_manager_ops->lm_setup::lease_setup() > which happens under a spinlock. > > With that the two remaining subsystems that call __f_setown() are: > > (4) dnotify > (5) sockets > > Both have their own custom ioctls to set struct fown_struct and both > have been converted to allocate a struct fown_struct on demand from > their respective ioctls. > > Interactions with O_PATH are fine as well e.g., when opening a /dev/tty > as O_PATH then no file->f_op->open() happens thus no file->f_owner is > allocated. That's fine as no file operation will be set for those and > the device has never been opened. fcntl()s called on such things will > just allocate a ->f_owner on demand. Although I have zero idea why'd you > care about f_owner on an O_PATH fd. > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > --- > * There's no more cleanup macros used in this version as we can solve > that all much simpler. > * Survives LTP which tests a bunch of that stuff. > * Survives perf's watermak signal tests which make use of FASYNC. > > Goes into -next unless I hear objections. > --- > > --- > drivers/net/tun.c | 6 ++ > drivers/tty/tty_io.c | 6 ++ > fs/fcntl.c | 153 ++++++++++++++++++++++++++++++++++---------- > fs/file_table.c | 6 +- > fs/locks.c | 6 +- > fs/notify/dnotify/dnotify.c | 6 +- > include/linux/fs.h | 11 +++- > net/core/sock.c | 2 +- > security/selinux/hooks.c | 2 +- > security/smack/smack_lsm.c | 2 +- > 10 files changed, 157 insertions(+), 43 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 1d06c560c5e6..6fe5e8f7017c 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -3451,6 +3451,12 @@ static int tun_chr_fasync(int fd, struct file *file, int on) > struct tun_file *tfile = file->private_data; > int ret; > > + if (on) { > + ret = file_f_owner_allocate(file); > + if (ret) > + goto out; > + } > + > if ((ret = fasync_helper(fd, file, on, &tfile->fasync)) < 0) > goto out; > > diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c > index 407b0d87b7c1..7ae0c8934f42 100644 > --- a/drivers/tty/tty_io.c > +++ b/drivers/tty/tty_io.c > @@ -2225,6 +2225,12 @@ static int __tty_fasync(int fd, struct file *filp, int on) > if (tty_paranoia_check(tty, file_inode(filp), "tty_fasync")) > goto out; > > + if (on) { > + retval = file_f_owner_allocate(filp); > + if (retval) > + goto out; > + } > + > retval = fasync_helper(fd, filp, on, &tty->fasync); > if (retval <= 0) > goto out; > diff --git a/fs/fcntl.c b/fs/fcntl.c > index 300e5d9ad913..b002730fdcd1 100644 > --- a/fs/fcntl.c > +++ b/fs/fcntl.c > @@ -87,22 +87,53 @@ static int setfl(int fd, struct file * filp, unsigned int arg) > return error; > } > > +/* > + * Allocate an file->f_owner struct if it doesn't exist, handling racing > + * allocations correctly. > + */ > +int file_f_owner_allocate(struct file *file) > +{ > + struct fown_struct *f_owner; > + > + f_owner = file_f_owner(file); > + if (f_owner) > + return 0; > + > + f_owner = kzalloc(sizeof(struct fown_struct), GFP_KERNEL); > + if (!f_owner) > + return -ENOMEM; > + > + rwlock_init(&f_owner->lock); > + f_owner->file = file; > + /* If someone else raced us, drop our allocation. */ > + if (unlikely(cmpxchg(&file->f_owner, NULL, f_owner))) nit: try_cmpxchg generates better asm and should be fine here. > + kfree(f_owner); > + return 0; > +} > +EXPORT_SYMBOL(file_f_owner_allocate); > + > static void f_modown(struct file *filp, struct pid *pid, enum pid_type type, > int force) > { > - write_lock_irq(&filp->f_owner.lock); > - if (force || !filp->f_owner.pid) { > - put_pid(filp->f_owner.pid); > - filp->f_owner.pid = get_pid(pid); > - filp->f_owner.pid_type = type; > + struct fown_struct *f_owner; > + > + f_owner = file_f_owner(filp); > + if (WARN_ON_ONCE(!f_owner)) > + return; > + > + write_lock_irq(&f_owner->lock); > + if (force || !f_owner->pid) { > + put_pid(f_owner->pid); > + f_owner->pid = get_pid(pid); > + f_owner->pid_type = type; > > if (pid) { > const struct cred *cred = current_cred(); > - filp->f_owner.uid = cred->uid; > - filp->f_owner.euid = cred->euid; > + f_owner->uid = cred->uid; > + f_owner->euid = cred->euid; > } > } > - write_unlock_irq(&filp->f_owner.lock); > + write_unlock_irq(&f_owner->lock); > } > > void __f_setown(struct file *filp, struct pid *pid, enum pid_type type, > @@ -119,6 +150,8 @@ int f_setown(struct file *filp, int who, int force) > struct pid *pid = NULL; > int ret = 0; > > + might_sleep(); > + > type = PIDTYPE_TGID; > if (who < 0) { > /* avoid overflow below */ > @@ -129,6 +162,10 @@ int f_setown(struct file *filp, int who, int force) > who = -who; > } > > + ret = file_f_owner_allocate(filp); > + if (ret) > + return ret; > + > rcu_read_lock(); > if (who) { > pid = find_vpid(who); > @@ -152,16 +189,21 @@ void f_delown(struct file *filp) > pid_t f_getown(struct file *filp) > { > pid_t pid = 0; > + struct fown_struct *f_owner; > > - read_lock_irq(&filp->f_owner.lock); > + f_owner = file_f_owner(filp); > + if (!f_owner) > + return pid; > + > + read_lock_irq(&f_owner->lock); > rcu_read_lock(); > - if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) { > - pid = pid_vnr(filp->f_owner.pid); > - if (filp->f_owner.pid_type == PIDTYPE_PGID) > + if (pid_task(f_owner->pid, f_owner->pid_type)) { > + pid = pid_vnr(f_owner->pid); > + if (f_owner->pid_type == PIDTYPE_PGID) > pid = -pid; > } > rcu_read_unlock(); > - read_unlock_irq(&filp->f_owner.lock); > + read_unlock_irq(&f_owner->lock); > return pid; > } > > @@ -194,6 +236,10 @@ static int f_setown_ex(struct file *filp, unsigned long arg) > return -EINVAL; > } > > + ret = file_f_owner_allocate(filp); > + if (ret) > + return ret; > + > rcu_read_lock(); > pid = find_vpid(owner.pid); > if (owner.pid && !pid) > @@ -210,13 +256,20 @@ static int f_getown_ex(struct file *filp, unsigned long arg) > struct f_owner_ex __user *owner_p = (void __user *)arg; > struct f_owner_ex owner = {}; > int ret = 0; > + struct fown_struct *f_owner; > + enum pid_type pid_type = PIDTYPE_PID; > > - read_lock_irq(&filp->f_owner.lock); > - rcu_read_lock(); > - if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) > - owner.pid = pid_vnr(filp->f_owner.pid); > - rcu_read_unlock(); > - switch (filp->f_owner.pid_type) { > + f_owner = file_f_owner(filp); > + if (f_owner) { > + read_lock_irq(&f_owner->lock); > + rcu_read_lock(); > + if (pid_task(f_owner->pid, f_owner->pid_type)) > + owner.pid = pid_vnr(f_owner->pid); > + rcu_read_unlock(); > + pid_type = f_owner->pid_type; > + } > + > + switch (pid_type) { > case PIDTYPE_PID: > owner.type = F_OWNER_TID; > break; > @@ -234,7 +287,8 @@ static int f_getown_ex(struct file *filp, unsigned long arg) > ret = -EINVAL; > break; > } > - read_unlock_irq(&filp->f_owner.lock); > + if (f_owner) > + read_unlock_irq(&f_owner->lock); > > if (!ret) { > ret = copy_to_user(owner_p, &owner, sizeof(owner)); > @@ -248,14 +302,18 @@ static int f_getown_ex(struct file *filp, unsigned long arg) > static int f_getowner_uids(struct file *filp, unsigned long arg) > { > struct user_namespace *user_ns = current_user_ns(); > + struct fown_struct *f_owner; > uid_t __user *dst = (void __user *)arg; > - uid_t src[2]; > + uid_t src[2] = {0, 0}; > int err; > > - read_lock_irq(&filp->f_owner.lock); > - src[0] = from_kuid(user_ns, filp->f_owner.uid); > - src[1] = from_kuid(user_ns, filp->f_owner.euid); > - read_unlock_irq(&filp->f_owner.lock); > + f_owner = file_f_owner(filp); > + if (f_owner) { > + read_lock_irq(&f_owner->lock); > + src[0] = from_kuid(user_ns, f_owner->uid); > + src[1] = from_kuid(user_ns, f_owner->euid); > + read_unlock_irq(&f_owner->lock); > + } > > err = put_user(src[0], &dst[0]); > err |= put_user(src[1], &dst[1]); > @@ -343,6 +401,30 @@ static long f_dupfd_query(int fd, struct file *filp) > return f.file == filp; > } > > +static int f_owner_sig(struct file *filp, int signum, bool setsig) > +{ > + int ret = 0; > + struct fown_struct *f_owner; > + > + might_sleep(); > + > + if (setsig) { > + if (!valid_signal(signum)) > + return -EINVAL; > + > + ret = file_f_owner_allocate(filp); > + if (ret) > + return ret; > + } > + > + f_owner = file_f_owner(filp); > + if (setsig) > + f_owner->signum = signum; > + else if (f_owner) > + ret = f_owner->signum; > + return ret; > +} > + > static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, > struct file *filp) > { > @@ -421,15 +503,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, > err = f_getowner_uids(filp, arg); > break; > case F_GETSIG: > - err = filp->f_owner.signum; > + err = f_owner_sig(filp, 0, false); > break; > case F_SETSIG: > - /* arg == 0 restores default behaviour. */ > - if (!valid_signal(argi)) { > - break; > - } > - err = 0; > - filp->f_owner.signum = argi; > + err = f_owner_sig(filp, argi, true); > break; > case F_GETLEASE: > err = fcntl_getlease(filp); > @@ -844,14 +921,19 @@ static void send_sigurg_to_task(struct task_struct *p, > do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, type); > } > > -int send_sigurg(struct fown_struct *fown) > +int send_sigurg(struct file *file) > { > + struct fown_struct *fown; > struct task_struct *p; > enum pid_type type; > struct pid *pid; > unsigned long flags; > int ret = 0; > > + fown = file_f_owner(file); > + if (fown) > + return 0; This needs to be fixed (as Mateusz pointed out). > + > read_lock_irqsave(&fown->lock, flags); > > type = fown->pid_type; > @@ -1027,13 +1109,16 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band) > } > read_lock_irqsave(&fa->fa_lock, flags); > if (fa->fa_file) { > - fown = &fa->fa_file->f_owner; > + fown = file_f_owner(fa->fa_file); > + if (!fown) > + goto next; > /* Don't send SIGURG to processes which have not set a > queued signum: SIGURG has its own default signalling > mechanism. */ > if (!(sig == SIGURG && fown->signum == 0)) > send_sigio(fown, fa->fa_fd, band); > } > +next: > read_unlock_irqrestore(&fa->fa_lock, flags); > fa = rcu_dereference(fa->fa_next); > } > diff --git a/fs/file_table.c b/fs/file_table.c > index ca7843dde56d..41ff037a8dc9 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -155,7 +155,6 @@ static int init_file(struct file *f, int flags, const struct cred *cred) > return error; > } > > - rwlock_init(&f->f_owner.lock); > spin_lock_init(&f->f_lock); > mutex_init(&f->f_pos_lock); > f->f_flags = flags; > @@ -425,7 +424,10 @@ static void __fput(struct file *file) > cdev_put(inode->i_cdev); > } > fops_put(file->f_op); > - put_pid(file->f_owner.pid); > + if (file->f_owner) { > + put_pid(file->f_owner->pid); > + kfree(file->f_owner); > + } > put_file_access(file); > dput(dentry); > if (unlikely(mode & FMODE_NEED_UNMOUNT)) > diff --git a/fs/locks.c b/fs/locks.c > index 9afb16e0683f..c0d312481b97 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -1451,7 +1451,7 @@ int lease_modify(struct file_lease *fl, int arg, struct list_head *dispose) > struct file *filp = fl->c.flc_file; > > f_delown(filp); > - filp->f_owner.signum = 0; > + file_f_owner(filp)->signum = 0; > fasync_helper(0, fl->c.flc_file, 0, &fl->fl_fasync); > if (fl->fl_fasync != NULL) { > printk(KERN_ERR "locks_delete_lock: fasync == %p\n", fl->fl_fasync); > @@ -1783,6 +1783,10 @@ generic_add_lease(struct file *filp, int arg, struct file_lease **flp, void **pr > lease = *flp; > trace_generic_add_lease(inode, lease); > > + error = file_f_owner_allocate(filp); > + if (error) > + return error; > + > /* Note that arg is never F_UNLCK here */ > ctx = locks_get_lock_context(inode, arg); > if (!ctx) > diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c > index f3669403fabf..46440fbb8662 100644 > --- a/fs/notify/dnotify/dnotify.c > +++ b/fs/notify/dnotify/dnotify.c > @@ -110,7 +110,7 @@ static int dnotify_handle_event(struct fsnotify_mark *inode_mark, u32 mask, > prev = &dn->dn_next; > continue; > } > - fown = &dn->dn_filp->f_owner; > + fown = file_f_owner(dn->dn_filp); > send_sigio(fown, dn->dn_fd, POLL_MSG); > if (dn->dn_mask & FS_DN_MULTISHOT) > prev = &dn->dn_next; > @@ -316,6 +316,10 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned int arg) > goto out_err; > } > > + error = file_f_owner_allocate(filp); > + if (error) > + goto out_err; > + > /* set up the new_fsn_mark and new_dn_mark */ > new_fsn_mark = &new_dn_mark->fsn_mark; > fsnotify_init_mark(new_fsn_mark, dnotify_group); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index fd34b5755c0b..319c566a9e98 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -947,6 +947,7 @@ static inline unsigned imajor(const struct inode *inode) > } > > struct fown_struct { > + struct file *file; /* backpointer for security modules */ This struct was 32 bytes before (on x86_64). Now it'll be 40. That's fine, but it may be worthwhile to create a dedicated slabcache for this now, since it's no longer a power-of-2 size. > rwlock_t lock; /* protects pid, uid, euid fields */ > struct pid *pid; /* pid or -pgrp where SIGIO should be sent */ > enum pid_type pid_type; /* Kind of process group SIGIO should be sent to */ > @@ -1011,7 +1012,7 @@ struct file { > struct mutex f_pos_lock; > loff_t f_pos; > unsigned int f_flags; > - struct fown_struct f_owner; > + struct fown_struct *f_owner; > const struct cred *f_cred; > struct file_ra_state f_ra; > struct path f_path; > @@ -1076,6 +1077,12 @@ struct file_lease; > #define OFFT_OFFSET_MAX type_max(off_t) > #endif > > +int file_f_owner_allocate(struct file *file); > +static inline struct fown_struct *file_f_owner(const struct file *file) > +{ > + return READ_ONCE(file->f_owner); > +} > + > extern void send_sigio(struct fown_struct *fown, int fd, int band); > > static inline struct inode *file_inode(const struct file *f) > @@ -1124,7 +1131,7 @@ extern void __f_setown(struct file *filp, struct pid *, enum pid_type, int force > extern int f_setown(struct file *filp, int who, int force); > extern void f_delown(struct file *filp); > extern pid_t f_getown(struct file *filp); > -extern int send_sigurg(struct fown_struct *fown); > +extern int send_sigurg(struct file *file); > > /* > * sb->s_flags. Note that these mirror the equivalent MS_* flags where > diff --git a/net/core/sock.c b/net/core/sock.c > index 9abc4fe25953..bbe4c58470c3 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -3429,7 +3429,7 @@ static void sock_def_destruct(struct sock *sk) > void sk_send_sigurg(struct sock *sk) > { > if (sk->sk_socket && sk->sk_socket->file) > - if (send_sigurg(&sk->sk_socket->file->f_owner)) > + if (send_sigurg(sk->sk_socket->file)) > sk_wake_async(sk, SOCK_WAKE_URG, POLL_PRI); > } > EXPORT_SYMBOL(sk_send_sigurg); > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 55c78c318ccd..ed252cfba4e9 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -3940,7 +3940,7 @@ static int selinux_file_send_sigiotask(struct task_struct *tsk, > struct file_security_struct *fsec; > > /* struct fown_struct is never outside the context of a struct file */ > - file = container_of(fown, struct file, f_owner); > + file = fown->file; > > fsec = selinux_file(file); > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index 4164699cd4f6..cb33920ab67c 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -1950,7 +1950,7 @@ static int smack_file_send_sigiotask(struct task_struct *tsk, > /* > * struct fown_struct is never outside the context of a struct file > */ > - file = container_of(fown, struct file, f_owner); > + file = fown->file; > > /* we don't log here as rc can be overriden */ > blob = smack_file(file); > > --- > base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b > change-id: 20240813-work-f_owner-0fbbc50f9671 > Aside from the bug that Mateusz pointed out, this looks fine to me. Assuming you fix that bug: Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>