In some cases, rcuwait_wake_up can be called even if the actual likelihood of a wakeup is very low. If CONFIG_PREEMPT_RCU is active, the resulting rcu_read_lock/rcu_read_unlock pair can be relatively expensive, and in fact it is unnecessary when there is no w->task to keep alive: the memory barrier before the read is what matters in order to avoid missed wakeups. Therefore, do an early check of w->task right after the barrier, and skip rcu_read_lock/rcu_read_unlock unless there is someone waiting for a wakeup. Running kvm-unit-test/vmexit.flat with APICv disabled, most interrupt injection tests (tscdeadline*, self_ipi*, x2apic_self_ipi*) improve by around 600 cpu cycles. Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx> Cc: Oleg Nesterov <oleg@xxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Reported-by: Wanpeng Li <wanpengli@xxxxxxxxxxx> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> --- kernel/exit.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 91a43e57a32e..a38a08dbf85e 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -234,8 +234,6 @@ int rcuwait_wake_up(struct rcuwait *w) int ret = 0; struct task_struct *task; - rcu_read_lock(); - /* * Order condition vs @task, such that everything prior to the load * of @task is visible. This is the condition as to why the user called @@ -245,10 +243,22 @@ int rcuwait_wake_up(struct rcuwait *w) * WAIT WAKE * [S] tsk = current [S] cond = true * MB (A) MB (B) - * [L] cond [L] tsk + * [L] cond [L] rcuwait_active(w) + * task = rcu_dereference(w->task) */ smp_mb(); /* (B) */ +#ifdef CONFIG_PREEMPT_RCU + /* + * The cost of rcu_read_lock() dominates for preemptible RCU, + * avoid it if possible. + */ + if (!rcuwait_active(w)) + return ret; +#endif + + rcu_read_lock(); + task = rcu_dereference(w->task); if (task) ret = wake_up_process(task); -- 2.27.0