> you guys sort it out ;) You're all driving me nuts. ;) Fixed patch appended and put on vfs.misc for testing... @Linus, you ok with the appended thing?
>From 9b0919d12c74b3de6c23c52dc5b3d6ffd24fecee Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Tue, 8 Aug 2023 19:26:35 +0200 Subject: [PATCH] fs: use __fput_sync in close(2) close(2) is a special case which guarantees a shallow kernel stack, making delegation to task_work machinery unnecessary. Said delegation is problematic as it involves atomic ops and interrupt masking trips, none of which are cheap on x86-64. Forcing close(2) to do it looks like an oversight in the original work. Moreover presence of CONFIG_RSEQ adds an additional overhead as fput() -> task_work_add(..., TWA_RESUME) -> set_notify_resume() makes the thread returning to userspace land in resume_user_mode_work(), where rseq_handle_notify_resume takes a SMAP round-trip if rseq is enabled for the thread (and it is by default with contemporary glibc). Sample result when benchmarking open1_processes -t 1 from will-it-scale (that's an open + close loop) + tmpfs on /tmp, running on the Sapphire Rapid CPU (ops/s): stock+RSEQ: 1329857 stock-RSEQ: 1421667 (+7%) patched: 1523521 (+14.5% / +7%) (with / without rseq) Patched result is the same regardless of rseq as the codepath is avoided. Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> --- fs/file_table.c | 5 +---- fs/open.c | 27 ++++++++++++++++++++++++--- 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index fc7d677ff5ad..ee21b3da9d08 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -461,11 +461,8 @@ void fput(struct file *file) */ void __fput_sync(struct file *file) { - if (atomic_long_dec_and_test(&file->f_count)) { - struct task_struct *task = current; - BUG_ON(!(task->flags & PF_KTHREAD)); + if (atomic_long_dec_and_test(&file->f_count)) __fput(file); - } } EXPORT_SYMBOL(fput); diff --git a/fs/open.c b/fs/open.c index e6ead0f19964..1f5b3a5f7f18 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1503,7 +1503,7 @@ SYSCALL_DEFINE2(creat, const char __user *, pathname, umode_t, mode) * "id" is the POSIX thread ID. We use the * files pointer for this.. */ -int filp_close(struct file *filp, fl_owner_t id) +static int filp_flush(struct file *filp, fl_owner_t id) { int retval = 0; @@ -1520,10 +1520,18 @@ int filp_close(struct file *filp, fl_owner_t id) dnotify_flush(filp, id); locks_remove_posix(filp, id); } - fput(filp); return retval; } +int filp_close(struct file *filp, fl_owner_t id) +{ + int retval; + + retval = filp_flush(filp, id); + fput(filp); + + return retval; +} EXPORT_SYMBOL(filp_close); /* @@ -1533,7 +1541,20 @@ EXPORT_SYMBOL(filp_close); */ SYSCALL_DEFINE1(close, unsigned int, fd) { - int retval = close_fd(fd); + int retval; + struct file *file; + + file = close_fd_get_file(fd); + if (!file) + return -EBADF; + + retval = filp_flush(file, current->files); + + /* + * We're returning to user space. Don't bother + * with any delayed fput() cases. + */ + __fput_sync(file); /* can't restart close syscall because file table entry was cleared */ if (unlikely(retval == -ERESTARTSYS || -- 2.34.1