On 8/8/23, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > That said, I detest Mateusz' patch. I hate these kinds of "do > different things based on flags" interfaces. Particularly when it > spreads out like this. I agree it is kind of crap, I needed something to test the impact. > But even if we want to do this - and I have absolutely no objections > to it conceptually as per above - we need to be a lot more surgical > about it, and not pass stupid flags around. > > Here's a TOTALLY UNTESTED(!) patch that I think effectively does what > Mateusz wants done, but does it all within just fs/open.c and only for > the obvious context of the close() system call itself. > > All it needs is to just split out the "flush" part from filp_close(), > and we already had all the other infrastructure for this operation. > Few hours ago I sent another version which very closely resembles what you did :) 2 main differences: - i somehow missed close_fd_get_file so I hacked my own based on close_fd - you need to whack the kthread assert in __fput_sync > Mateusz, two questions: > > (a) does this patch work for you? > Well I'm not *testing* patch right now, but it does work for me in the sense of taking care of the problem (modulo whatever fixups, definitely the assert thing) > (b) do you have numbers for this all? > I'm offended you ask, it's all in my opening e-mail. Copy pasting again from the commit message: [orig] fs: use __fput_sync in close(2) close(2) is a special close which guarantees shallow kernel stack, making delegation to task_work machinery unnecessary. Said delegation is problematic as it involves atomic ops and interrupt masking trips, none of which are cheap on x86-64. Forcing close(2) to do it looks like an oversight in the original work. Moreover presence of CONFIG_RSEQ adds an additional overhead as fput() -> task_work_add(..., TWA_RESUME) -> set_notify_resume() makes the thread returning to userspace land in resume_user_mode_work(), where rseq_handle_notify_resume takes a SMAP round-trip if rseq is enabled for the thread (and it is by default with contemporary glibc). Sample result when benchmarking open1_processes -t 1 from will-it-scale (that's a open + close loop) + tmpfs on /tmp, running on the Sapphire Rapid CPU (ops/s): stock+RSEQ: 1329857 stock-RSEQ: 1421667 (+7%) patched: 1523521 (+14.5% / +7%) (with / without rseq) Patched result is the same as it dodges rseq. [/orig] Since modulo fixups your patch clearly elides all of this, I don't think there is a need to re-bench (but I can do it if you ship me a version you consider committable just to be on the safe side) -- Mateusz Guzik <mjguzik gmail.com>