On Sat, Jun 15, 2024 at 06:41:45AM +0200, Mateusz Guzik wrote: > On Fri, Jun 14, 2024 at 12:34:16PM -0400, Yu Ma wrote: > > alloc_fd() has a sanity check inside to make sure the FILE object mapping to the > > Total nitpick: FILE is the libc thing, I would refer to it as 'struct > file'. See below for the actual point. > > > Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by > > 32%, write improved by 15% on Intel ICX 160 cores configuration with v6.8-rc6. > > > > Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> > > Signed-off-by: Yu Ma <yu.ma@xxxxxxxxx> > > --- > > fs/file.c | 14 ++++++-------- > > 1 file changed, 6 insertions(+), 8 deletions(-) > > > > diff --git a/fs/file.c b/fs/file.c > > index a0e94a178c0b..59d62909e2e3 100644 > > --- a/fs/file.c > > +++ b/fs/file.c > > @@ -548,13 +548,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags) > > else > > __clear_close_on_exec(fd, fdt); > > error = fd; > > -#if 1 > > - /* Sanity check */ > > - if (rcu_access_pointer(fdt->fd[fd]) != NULL) { > > - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd); > > - rcu_assign_pointer(fdt->fd[fd], NULL); > > - } > > -#endif > > > > I was going to ask when was the last time anyone seen this fire and > suggest getting rid of it if enough time(tm) passed. Turns out it does > show up sometimes, latest result I found is 2017 vintage: > https://groups.google.com/g/syzkaller-bugs/c/jfQ7upCDf9s/m/RQjhDrZ7AQAJ > > So you are moving this to another locked area, but one which does not > execute in the benchmark? > > Patch 2/3 states 28% read and 14% write increase, this commit message > claims it goes up to 32% and 15% respectively -- pretty big. I presume > this has to do with bouncing a line containing the fd. > > I would argue moving this check elsewhere is about as good as removing > it altogether, but that's for the vfs overlords to decide. > > All that aside, looking at disasm of alloc_fd it is pretty clear there > is time to save, for example: > > if (unlikely(nr >= fdt->max_fds)) { > if (fd >= end) { > error = -EMFILE; > goto out; > } > error = expand_files(fd, fd); > if (error < 0) > goto out; > if (error) > goto repeat; > } > Now that I wrote it I noticed the fd < end check has to be performed regardless of max_fds -- someone could have changed rlimit to a lower value after using a higher fd. But the main point stands: the call to expand_files and associated error handling don't have to be there. > This elides 2 branches and a func call in the common case. Completely > untested, maybe has some brainfarts, feel free to take without credit > and further massage the routine. > > Moreover my disasm shows that even looking for a bit results in > a func call(!) to _find_next_zero_bit -- someone(tm) should probably > massage it into another inline. > > After the above massaging is done and if it turns out the check has to > stay, you can plausibly damage-control it with prefetch -- issue it > immediately after finding the fd number, before any other work. > > All that said, by the above I'm confident there is still *some* > performance left on the table despite the lock. > > > out: > > spin_unlock(&files->file_lock); > > @@ -572,7 +565,7 @@ int get_unused_fd_flags(unsigned flags) > > } > > EXPORT_SYMBOL(get_unused_fd_flags); > > > > -static void __put_unused_fd(struct files_struct *files, unsigned int fd) > > +static inline void __put_unused_fd(struct files_struct *files, unsigned int fd) > > { > > struct fdtable *fdt = files_fdtable(files); > > __clear_open_fd(fd, fdt); > > @@ -583,7 +576,12 @@ static void __put_unused_fd(struct files_struct *files, unsigned int fd) > > void put_unused_fd(unsigned int fd) > > { > > struct files_struct *files = current->files; > > + struct fdtable *fdt = files_fdtable(files); > > spin_lock(&files->file_lock); > > + if (unlikely(rcu_access_pointer(fdt->fd[fd]))) { > > + printk(KERN_WARNING "put_unused_fd: slot %d not NULL!\n", fd); > > + rcu_assign_pointer(fdt->fd[fd], NULL); > > + } > > __put_unused_fd(files, fd); > > spin_unlock(&files->file_lock); > > }