Re: [PATCH 3/3] fs/file.c: move sanity_check from alloc_fd() to put_unused_fd()

Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> · Mon, 17 Jun 2024 11:04:41 -0700

On Mon, 2024-06-17 at 10:55 -0700, Tim Chen wrote:
> On Sat, 2024-06-15 at 07:07 +0200, Mateusz Guzik wrote:
> > On Sat, Jun 15, 2024 at 06:41:45AM +0200, Mateusz Guzik wrote:
> > > On Fri, Jun 14, 2024 at 12:34:16PM -0400, Yu Ma wrote:
> > > > alloc_fd() has a sanity check inside to make sure the FILE object mapping to the
> > > 
> > > 
> > 
> > Now that I wrote it I noticed the fd < end check has to be performed
> > regardless of max_fds -- someone could have changed rlimit to a lower
> > value after using a higher fd. But the main point stands: the call to
> > expand_files and associated error handling don't have to be there.
> 
> To really prevent someone from mucking with rlimit, we should probably
> take the task_lock to prevent do_prlimit() racing with this function.
> 
> task_lock(current->group_leader);
> 
> Tim


And also move the task_lock in do_prlimit() before the RLIMIT_NOFILE check.

diff --git a/kernel/sys.c b/kernel/sys.c
index 3a2df1bd9f64..b4e523728c3e 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1471,6 +1471,7 @@ static int do_prlimit(struct task_struct *tsk, unsigned int resource,
                return -EINVAL;
        resource = array_index_nospec(resource, RLIM_NLIMITS);
 
+       task_lock(tsk->group_leader);
        if (new_rlim) {
                if (new_rlim->rlim_cur > new_rlim->rlim_max)
                        return -EINVAL;
@@ -1481,7 +1482,6 @@ static int do_prlimit(struct task_struct *tsk, unsigned int resource,
 
        /* Holding a refcount on tsk protects tsk->signal from disappearing. */
        rlim = tsk->signal->rlim + resource;
-       task_lock(tsk->group_leader);
        if (new_rlim) {
                /*
                 * Keep the capable check against init_user_ns until cgroups can

Tim
> > 
> > > This elides 2 branches and a func call in the common case. Completely
> > > untested, maybe has some brainfarts, feel free to take without credit
> > > and further massage the routine.
> > > 
> > > Moreover my disasm shows that even looking for a bit results in
> > > a func call(!) to _find_next_zero_bit -- someone(tm) should probably
> > > massage it into another inline.
> > > 
> > > After the above massaging is done and if it turns out the check has to
> > > stay, you can plausibly damage-control it with prefetch -- issue it
> > > immediately after finding the fd number, before any other work.
> > > 
> > > All that said, by the above I'm confident there is still *some*
> > > performance left on the table despite the lock.
> > > 
> > > >  out:
> > > >  	spin_unlock(&files->file_lock);
> > > > @@ -572,7 +565,7 @@ int get_unused_fd_flags(unsigned flags)
> > > >  }
> > > >  EXPORT_SYMBOL(get_unused_fd_flags);
> > > >  
> > > > -static void __put_unused_fd(struct files_struct *files, unsigned int fd)
> > > > +static inline void __put_unused_fd(struct files_struct *files, unsigned int fd)
> > > >  {
> > > >  	struct fdtable *fdt = files_fdtable(files);
> > > >  	__clear_open_fd(fd, fdt);
> > > > @@ -583,7 +576,12 @@ static void __put_unused_fd(struct files_struct *files, unsigned int fd)
> > > >  void put_unused_fd(unsigned int fd)
> > > >  {
> > > >  	struct files_struct *files = current->files;
> > > > +	struct fdtable *fdt = files_fdtable(files);
> > > >  	spin_lock(&files->file_lock);
> > > > +	if (unlikely(rcu_access_pointer(fdt->fd[fd]))) {
> > > > +		printk(KERN_WARNING "put_unused_fd: slot %d not NULL!\n", fd);
> > > > +		rcu_assign_pointer(fdt->fd[fd], NULL);
> > > > +	}
> > > >  	__put_unused_fd(files, fd);
> > > >  	spin_unlock(&files->file_lock);
> > > >  }
> > 
>