On 27 Apr 2023 13:35:31 +0000 Bernd Schubert <bschubert@xxxxxxx> > Btw, a very hackish way to 'solve' the issue is this > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index cd7aa679c3ee..dd32effb5010 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -373,6 +373,26 @@ static void request_wait_answer(struct fuse_req *req) > int err; > int prev_cpu = task_cpu(current); > > + /* When running over uring and core affined userspace threads, we > + * do not want to let migrate away the request submitting process. > + * Issue is that even after waking up on the right core, processes > + * that have submitted requests might get migrated away, because > + * the ring thread is still doing a bit of work or is in the process > + * to go to sleep. Assumption here is that processes are started on > + * the right core (i.e. idle cores) and can then stay on that core > + * when they come and do file system requests. > + * Another alternative way is to set SCHED_IDLE for ring threads, > + * but that would have an issue if there are other processes keeping > + * the cpu busy. > + * SCHED_IDLE or this hack here result in about factor 3.5 for > + * max meta request performance. > + * > + * Ideal would to tell the scheduler that ring threads are not disturbing > + * that migration away from it should very very rarely happen. > + */ > + if (fc->ring.ready) > + migrate_disable(); > + > if (!fc->no_interrupt) { > /* Any signal may interrupt this */ > err = wait_event_interruptible(req->waitq, > If I understand it correctly, the seesaw workload hint to scheduler looks like the diff below, leaving scheduler free to pull the two players apart across CPU and to migrate anyone. --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -421,6 +421,7 @@ static void __fuse_request_send(struct f /* acquire extra reference, since request is still needed after fuse_request_end() */ __fuse_get_request(req); + current->seesaw = 1; queue_request_and_unlock(fiq, req); request_wait_answer(req); @@ -1229,6 +1230,7 @@ static ssize_t fuse_dev_do_read(struct f fc->max_write)) return -EINVAL; + current->seesaw = 1; restart: for (;;) { spin_lock(&fiq->lock); --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -953,6 +953,7 @@ struct task_struct { /* delay due to memory thrashing */ unsigned in_thrashing:1; #endif + unsigned seesaw:1; unsigned long atomic_flags; /* Flags requiring atomic access. */ --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7424,6 +7424,8 @@ select_task_rq_fair(struct task_struct * if (wake_flags & WF_TTWU) { record_wakee(p); + if (p->seesaw && current->seesaw) + return cpu; if (sched_energy_enabled()) { new_cpu = find_energy_efficient_cpu(p, prev_cpu); if (new_cpu >= 0)