Re: [PATCH v2 3/3] fs/file.c: remove sanity_check from alloc_fd()

Mateusz Guzik <mjguzik@xxxxxxxxx> · Tue, 25 Jun 2024 15:11:23 +0200

On Tue, Jun 25, 2024 at 3:09 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> On Tue, Jun 25, 2024 at 2:08 PM Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Sat 22-06-24 11:49:04, Yu Ma wrote:
> > > alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> > > allocated fd is NULL. Remove this sanity check since it can be assured by
> > > exisitng zero initilization and NULL set when recycling fd.
> >   ^^^ existing  ^^^ initialization
> >
> > Well, since this is a sanity check, it is expected it never hits. Yet
> > searching the web shows it has hit a few times in the past :). So would
> > wrapping this with unlikely() give a similar performance gain while keeping
> > debugability? If unlikely() does not help, I agree we can remove this since
> > fd_install() actually has the same check:
> >
> > BUG_ON(fdt->fd[fd] != NULL);
> >
> > and there we need the cacheline anyway so performance impact is minimal.
> > Now, this condition in alloc_fd() is nice that it does not take the kernel
> > down so perhaps we could change the BUG_ON to WARN() dumping similar kind
> > of info as alloc_fd()?
> >
>
> Christian suggested just removing it.
>
> To my understanding the problem is not the branch per se, but the the
> cacheline bounce of the fd array induced by reading the status.
>
> Note the thing also nullifies the pointer, kind of defeating the
> BUG_ON in fd_install.
>
> I'm guessing it's not going to hurt to branch on it after releasing
> the lock and forego nullifying, more or less:
> diff --git a/fs/file.c b/fs/file.c
> index a3b72aa64f11..d22b867db246 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -524,11 +524,11 @@ static int alloc_fd(unsigned start, unsigned
> end, unsigned flags)
>          */
>         error = -EMFILE;
>         if (fd >= end)
> -               goto out;
> +               goto out_locked;
>
>         error = expand_files(files, fd);
>         if (error < 0)
> -               goto out;
> +               goto out_locked;
>
>         /*
>          * If we needed to expand the fs array we
> @@ -546,15 +546,15 @@ static int alloc_fd(unsigned start, unsigned
> end, unsigned flags)
>         else
>                 __clear_close_on_exec(fd, fdt);
>         error = fd;
> -#if 1
> -       /* Sanity check */
> -       if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> +       spin_unlock(&files->file_lock);
> +
> +       if (unlikely(rcu_access_pointer(fdt->fd[fd]) != NULL)) {
>                 printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> -               rcu_assign_pointer(fdt->fd[fd], NULL);
>         }
> -#endif

Now that I sent it it is of course not safe to deref without
protection from either rcu or the lock, so this would have to be
wrapped with rcu_read_lock, which makes it even less appealing.

Whacking the thing as in the submitted patch seems like the best way
forward here. :)

>
> -out:
> +       return error;
> +
> +out_locked:
>         spin_unlock(&files->file_lock);
>         return error;
>  }
>
>
>
> >                                                                 Honza
> >
> > > Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by
> > > 32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-rc4.
> > >
> > > Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Yu Ma <yu.ma@xxxxxxxxx>
> > > ---
> > >  fs/file.c | 7 -------
> > >  1 file changed, 7 deletions(-)
> > >
> > > diff --git a/fs/file.c b/fs/file.c
> > > index b4d25f6d4c19..1153b0b7ba3d 100644
> > > --- a/fs/file.c
> > > +++ b/fs/file.c
> > > @@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
> > >       else
> > >               __clear_close_on_exec(fd, fdt);
> > >       error = fd;
> > > -#if 1
> > > -     /* Sanity check */
> > > -     if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> > > -             printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> > > -             rcu_assign_pointer(fdt->fd[fd], NULL);
> > > -     }
> > > -#endif
> > >
> > >  out:
> > >       spin_unlock(&files->file_lock);
> > > --
> > > 2.43.0
> > >
> > --
> > Jan Kara <jack@xxxxxxxx>
> > SUSE Labs, CR
>
>
>
> --
> Mateusz Guzik <mjguzik gmail.com>

-- 
Mateusz Guzik <mjguzik gmail.com>