Re: fuse uring / wake_up on the same core

Hillf Danton <hdanton@xxxxxxxx> · Fri, 28 Apr 2023 09:44:43 +0800

On 27 Apr 2023 13:35:31 +0000 Bernd Schubert <bschubert@xxxxxxx>
> Btw, a very hackish way to 'solve' the issue is this
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index cd7aa679c3ee..dd32effb5010 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -373,6 +373,26 @@ static void request_wait_answer(struct fuse_req *req)
>          int err;
>          int prev_cpu = task_cpu(current);
>   
> +       /* When running over uring and core affined userspace threads, we
> +        * do not want to let migrate away the request submitting process.
> +        * Issue is that even after waking up on the right core, processes
> +        * that have submitted requests might get migrated away, because
> +        * the ring thread is still doing a bit of work or is in the process
> +        * to go to sleep. Assumption here is that processes are started on
> +        * the right core (i.e. idle cores) and can then stay on that core
> +        * when they come and do file system requests.
> +        * Another alternative way is to set SCHED_IDLE for ring threads,
> +        * but that would have an issue if there are other processes keeping
> +        * the cpu busy.
> +        * SCHED_IDLE or this hack here result in about factor 3.5 for
> +        * max meta request performance.
> +        *
> +        * Ideal would to tell the scheduler that ring threads are not disturbing
> +        * that migration away from it should very very rarely happen.
> +        */
> +       if (fc->ring.ready)
> +               migrate_disable();
> +
>          if (!fc->no_interrupt) {
>                  /* Any signal may interrupt this */
>                  err = wait_event_interruptible(req->waitq,
> 
If I understand it correctly, the seesaw workload hint to scheduler looks
like the diff below, leaving scheduler free to pull the two players apart
across CPU and to migrate anyone.

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -421,6 +421,7 @@ static void __fuse_request_send(struct f
 		/* acquire extra reference, since request is still needed
 		   after fuse_request_end() */
 		__fuse_get_request(req);
+		current->seesaw = 1;
 		queue_request_and_unlock(fiq, req);
 
 		request_wait_answer(req);
@@ -1229,6 +1230,7 @@ static ssize_t fuse_dev_do_read(struct f
 			   fc->max_write))
 		return -EINVAL;
 
+	current->seesaw = 1;
  restart:
 	for (;;) {
 		spin_lock(&fiq->lock);
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -953,6 +953,7 @@ struct task_struct {
 	/* delay due to memory thrashing */
 	unsigned                        in_thrashing:1;
 #endif
+	unsigned 			seesaw:1;
 
 	unsigned long			atomic_flags; /* Flags requiring atomic access. */
 
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7424,6 +7424,8 @@ select_task_rq_fair(struct task_struct *
 	if (wake_flags & WF_TTWU) {
 		record_wakee(p);
 
+		if (p->seesaw && current->seesaw)
+			return cpu;
 		if (sched_energy_enabled()) {
 			new_cpu = find_energy_efficient_cpu(p, prev_cpu);
 			if (new_cpu >= 0)