----- On Feb 2, 2022, at 6:23 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote: > On Tue, Feb 01, 2022 at 02:25:38PM -0500, Mathieu Desnoyers wrote: > >> +static inline void tg_vcpu_get(struct task_struct *t) >> +{ >> + struct cpumask *cpumask = &t->signal->vcpu_mask; >> + unsigned int vcpu; >> + >> + if (t->flags & PF_KTHREAD) >> + return; >> + /* Atomically reserve lowest available vcpu number. */ >> + do { >> + vcpu = cpumask_first_zero(cpumask); >> + WARN_ON_ONCE(vcpu >= nr_cpu_ids); >> + } while (cpumask_test_and_set_cpu(vcpu, cpumask)); >> + t->tg_vcpu = vcpu; >> +} >> + >> +static inline void tg_vcpu_put(struct task_struct *t) >> +{ >> + if (t->flags & PF_KTHREAD) >> + return; >> + cpumask_clear_cpu(t->tg_vcpu, &t->signal->vcpu_mask); >> + t->tg_vcpu = 0; >> +} > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 2e4ae00e52d1..2690e80977b1 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -4795,6 +4795,8 @@ prepare_task_switch(struct rq *rq, struct task_struct >> *prev, >> sched_info_switch(rq, prev, next); >> perf_event_task_sched_out(prev, next); >> rseq_preempt(prev); >> + tg_vcpu_put(prev); >> + tg_vcpu_get(next); > > > URGGHHH!!! that's *2* atomics extra on the context switch path. Worse, > that's on a line that's trivially contended with a few threads. There is one obvious optimization that just begs to be done here: when switching between threads belonging to the same process, we can simply take the vcpu_id tag of the prev thread and use it for next, without requiring any atomic operation. This only leaves the overhead of added atomics when scheduling between threads which belong to different processes. Does it matter as much ? If it's the case, then we should really scratch our heads a little more to come up with improvements. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com