On Sun, 19 Nov 2023 at 23:11, kernel test robot <oliver.sang@xxxxxxxxx> wrote: > > kernel test robot noticed a -2.9% regression of will-it-scale.per_thread_ops on: > > commit: 0ede61d8589cc2d93aa78230d74ac58b5b8d0244 ("file: convert to SLAB_TYPESAFE_BY_RCU") Ok, so __fget_light() is one of our more performance-critical things, and that commit ends up making it call a rather more expensive version in __get_file_rcu(), so we have: > 30.90 ą 4% -20.6 10.35 ą 2% perf-profile.self.cycles-pp.__fget_light > 0.00 +26.5 26.48 perf-profile.self.cycles-pp.__get_file_rcu and that "20% decrease balanced by 26% increase elsewhere" then directly causes the ~3% regression. I took a look at the code generation, and honestly, I think we're better off just making __fget_files_rcu() have special logic for this all, and not use __get_file_rcu(). The 'fd' case really is special because we need to do that non-speculative pointer access. Because it turns out that we already have to use array_index_nospec() to safely generate that pointer to the fd entry, and once you have to do that "use non-speculative accesses to generate a safe pointer", you might as well just go whole hog. So this takes a different approach, and uses the array_index_mask_nospec() thing that we have exactly to generate that non-speculative mask for these things: /* Mask is a 0 for invalid fd's, ~0 for valid ones */ mask = array_index_mask_nospec(fd, fdt->max_fds); and then it does something you can consider either horribly clever, or horribly ugly (this first part is basically just array_index_nospec()): /* fdentry points to the 'fd' offset, or fdt->fd[0] */ fdentry = fdt->fd + (fd&mask); and then we can just *unconditionally* do the load - but since we might be loading fd[0] for an invalid case, we need to mask the result too: /* Do the load, then mask any invalid result */ file = rcu_dereference_raw(*fdentry); file = (void *)(mask & (unsigned long)file); but now we have done everything without any conditionals, and the only conditional is now "did we load NULL" - which includes that "we masked the bad value". Then we just do that atomic_long_inc_not_zero() on the file, and re-check the pointer chain we used. I made files_lookup_fd_raw() do the same thing. The end result is much nicer code generation at least from my quick check. And I assume the regression will be gone, and hopefully even turned into an improvement since this is so hot. Comments? I also looked at that odd OPTIMIZER_HIDE_VAR() that __get_file_rcu() does, and I don't get it. Both things come from volatile accesses, I don't see the point of those games, but I also didn't care, since it's no longer in a critical code path. Christian? NOTE! This patch is not well tested. I verified an earlier version of this, but have been playing with it since, so caveat emptor. IOW, I might have messed up some "trivial cleanup" when prepping for sending it out... Linus
fs/file.c | 48 ++++++++++++++++++++++++++++++------------------ include/linux/fdtable.h | 15 ++++++++++----- 2 files changed, 40 insertions(+), 23 deletions(-) diff --git a/fs/file.c b/fs/file.c index 5fb0b146e79e..c74a6e8687d9 100644 --- a/fs/file.c +++ b/fs/file.c @@ -959,31 +959,42 @@ static inline struct file *__fget_files_rcu(struct files_struct *files, struct file *file; struct fdtable *fdt = rcu_dereference_raw(files->fdt); struct file __rcu **fdentry; + unsigned long mask; - if (unlikely(fd >= fdt->max_fds)) + /* Mask is a 0 for invalid fd's, ~0 for valid ones */ + mask = array_index_mask_nospec(fd, fdt->max_fds); + + /* fdentry points to the 'fd' offset, or fdt->fd[0] */ + fdentry = fdt->fd + (fd&mask); + + /* Do the load, then mask any invalid result */ + file = rcu_dereference_raw(*fdentry); + file = (void *)(mask & (unsigned long)file); + + if (unlikely(!file)) return NULL; - fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds); + /* + * Ok, we have a file pointer that was valid at + * some point, but it might have become stale since. + * + * We need to confirm it by incrementing the refcount + * and then check the lookup again. + * + * atomic_long_inc_not_zero() gives us a full memory + * barrier. We only really need an 'acquire' one to + * protect the loads below, but we don't have that. + */ + if (unlikely(!atomic_long_inc_not_zero(&file->f_count))) + continue; /* - * Ok, we have a file pointer. However, because we do - * this all locklessly under RCU, we may be racing with - * that file being closed. - * * Such a race can take two forms: * * (a) the file ref already went down to zero and the * file hasn't been reused yet or the file count * isn't zero but the file has already been reused. - */ - file = __get_file_rcu(fdentry); - if (unlikely(!file)) - return NULL; - - if (unlikely(IS_ERR(file))) - continue; - - /* + * * (b) the file table entry has changed under us. * Note that we don't need to re-check the 'fdt->fd' * pointer having changed, because it always goes @@ -991,7 +1002,8 @@ static inline struct file *__fget_files_rcu(struct files_struct *files, * * If so, we need to put our ref and try again. */ - if (unlikely(rcu_dereference_raw(files->fdt) != fdt)) { + if (unlikely(file != rcu_dereference_raw(*fdentry)) || + unlikely(rcu_dereference_raw(files->fdt) != fdt)) { fput(file); continue; } @@ -1128,13 +1140,13 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) * atomic_read_acquire() pairs with atomic_dec_and_test() in * put_files_struct(). */ - if (atomic_read_acquire(&files->count) == 1) { + if (likely(atomic_read_acquire(&files->count) == 1)) { file = files_lookup_fd_raw(files, fd); if (!file || unlikely(file->f_mode & mask)) return 0; return (unsigned long)file; } else { - file = __fget(fd, mask); + file = __fget_files(files, fd, mask); if (!file) return 0; return FDPUT_FPUT | (unsigned long)file; diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index bc4c3287a65e..a8a8b4d24619 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -83,12 +83,17 @@ struct dentry; static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsigned int fd) { struct fdtable *fdt = rcu_dereference_raw(files->fdt); + unsigned long mask = array_index_mask_nospec(fd, fdt->max_fds); + struct file *needs_masking; - if (fd < fdt->max_fds) { - fd = array_index_nospec(fd, fdt->max_fds); - return rcu_dereference_raw(fdt->fd[fd]); - } - return NULL; + /* + * 'mask' is zero for an out-of-bounds fd, all ones for ok. + * 'fd|~mask' is 'fd' for ok, or 0 for out of bounds. + * + * Accessing fdt->fd[0] is ok, but needs masking of the result. + */ + needs_masking = rcu_dereference_raw(fdt->fd[fd&mask]); + return (struct file *)(mask & (unsigned long)needs_masking); } static inline struct file *files_lookup_fd_locked(struct files_struct *files, unsigned int fd)