On 5/30/24 18:44, Shachar Sharon wrote: > On Wed, May 29, 2024 at 10:36 PM Bernd Schubert <bschubert@xxxxxxx> wrote: >> >> Most of the performance improvements >> with fuse-over-io-uring for synchronous requests is the possibility >> to run processing on the submitting cpu core and to also wake >> the submitting process on the same core - switching between >> cpu cores. >> >> Signed-off-by: Bernd Schubert <bschubert@xxxxxxx> >> --- >> fs/fuse/dev.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c >> index c7fd3849a105..851c5fa99946 100644 >> --- a/fs/fuse/dev.c >> +++ b/fs/fuse/dev.c >> @@ -333,7 +333,10 @@ void fuse_request_end(struct fuse_req *req) >> spin_unlock(&fc->bg_lock); >> } else { >> /* Wake up waiter sleeping in request_wait_answer() */ >> - wake_up(&req->waitq); >> + if (fuse_per_core_queue(fc)) >> + __wake_up_on_current_cpu(&req->waitq, TASK_NORMAL, NULL); >> + else >> + wake_up(&req->waitq); > > Would it be possible to apply this idea for regular FUSE connection? I probably should have written it in the commit message, without uring performance is the same or slightly worse. With direct-IO reads jobs /dev/fuse /dev/fuse (migrate off) (migrate on) 1 2023 1652 2 3375 2805 4 3823 4193 8 7796 8161 16 8520 8518 24 8361 8084 32 8717 8342 (in MB/s). I think there is no improvement as daemon threads process requests on random cores. I.e. request processing doesn't happen on the same core a request was submitted to. > What would happen if some (buggy or malicious) userspace FUSE server uses > sched_setaffinity(2) to run only on a subset of active CPUs? The request goes to the ring, which cpu it eventually handles should not matter. Performance will not be optimal then. That being said, the introduction mail points out an issue with xfstest generic/650, which disables/enables CPUs in a loop - I need to investigate what happens there. Thanks, Bernd