From: Neeraj Upadhyay <neeraj.upadhyay@xxxxxxxxxx> Currently, idle tasks are ignored by RCU-tasks. Change this to start paying attention to idle tasks except in deep-idle functions where RCU is not watching. With this, for architectures where kernel entry/exit and deep-idle functions have been properly tagged noinstr, Tasks Rude RCU can be disabled. [ neeraj.upadhyay: Frederic Weisbecker and Paul E. McKenney feedback. ] Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@xxxxxxxxxx> --- .../RCU/Design/Requirements/Requirements.rst | 12 +++--- kernel/rcu/tasks.h | 41 ++++++++----------- 2 files changed, 24 insertions(+), 29 deletions(-) diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst index 6125e7068d2c..5016b85d53d7 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.rst +++ b/Documentation/RCU/Design/Requirements/Requirements.rst @@ -2611,8 +2611,8 @@ critical sections that are delimited by voluntary context switches, that is, calls to schedule(), cond_resched(), and synchronize_rcu_tasks(). In addition, transitions to and from userspace execution also delimit tasks-RCU read-side critical sections. -Idle tasks are ignored by Tasks RCU, and Tasks Rude RCU may be used to -interact with them. +Idle tasks which are idle from RCU's perspective are ignored by Tasks RCU, +and Tasks Rude RCU may be used to interact with them. Note well that involuntary context switches are *not* Tasks-RCU quiescent states. After all, in preemptible kernels, a task executing code in a @@ -2643,10 +2643,10 @@ moniker. And this operation is considered to be quite rude by real-time workloads that don't want their ``nohz_full`` CPUs receiving IPIs and by battery-powered systems that don't want their idle CPUs to be awakened. -Once kernel entry/exit and deep-idle functions have been properly tagged -``noinstr``, Tasks RCU can start paying attention to idle tasks (except -those that are idle from RCU's perspective) and then Tasks Rude RCU can -be removed from the kernel. +As Tasks RCU now pays attention to idle tasks (except those that are idle +from RCU's perspective), once kernel entry/exit and deep-idle functions have +been properly tagged ``noinstr``, Tasks Rude RCU can be removed from the +kernel. The tasks-rude-RCU API is also reader-marking-free and thus quite compact, consisting solely of synchronize_rcu_tasks_rude(). diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 1947f9b6346d..72dc0d0a4a8f 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -912,14 +912,15 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp) //////////////////////////////////////////////////////////////////////// // // Simple variant of RCU whose quiescent states are voluntary context -// switch, cond_resched_tasks_rcu_qs(), user-space execution, and idle. -// As such, grace periods can take one good long time. There are no -// read-side primitives similar to rcu_read_lock() and rcu_read_unlock() -// because this implementation is intended to get the system into a safe -// state for some of the manipulations involved in tracing and the like. -// Finally, this implementation does not support high call_rcu_tasks() -// rates from multiple CPUs. If this is required, per-CPU callback lists -// will be needed. +// switch, cond_resched_tasks_rcu_qs(), user-space execution, and idle +// tasks which are in RCU-idle context. As such, grace periods can take +// one good long time. There are no read-side primitives similar to +// rcu_read_lock() and rcu_read_unlock() because this implementation is +// intended to get the system into a safe state for some of the +// manipulations involved in tracing and the like. Finally, this +// implementation does not support high call_rcu_tasks() rates from +// multiple CPUs. If this is required, per-CPU callback lists will be +// needed. // // The implementation uses rcu_tasks_wait_gp(), which relies on function // pointers in the rcu_tasks structure. The rcu_spawn_tasks_kthread() @@ -1079,14 +1080,6 @@ static bool rcu_tasks_is_holdout(struct task_struct *t) if (!READ_ONCE(t->on_rq)) return false; - /* - * Idle tasks (or idle injection) within the idle loop are RCU-tasks - * quiescent states. But CPU boot code performed by the idle task - * isn't a quiescent state. - */ - if (is_idle_task(t)) - return false; - cpu = task_cpu(t); if (t == idle_task(cpu)) @@ -1265,11 +1258,12 @@ static void tasks_rcu_exit_srcu_stall(struct timer_list *unused) * period elapses, in other words after all currently executing RCU * read-side critical sections have completed. call_rcu_tasks() assumes * that the read-side critical sections end at a voluntary context - * switch (not a preemption!), cond_resched_tasks_rcu_qs(), entry into idle, - * or transition to usermode execution. As such, there are no read-side - * primitives analogous to rcu_read_lock() and rcu_read_unlock() because - * this primitive is intended to determine that all tasks have passed - * through a safe state, not so much for data-structure synchronization. + * switch (not a preemption!), cond_resched_tasks_rcu_qs(), entry into + * RCU-idle context or transition to usermode execution. As such, there + * are no read-side primitives analogous to rcu_read_lock() and + * rcu_read_unlock() because this primitive is intended to determine + * that all tasks have passed through a safe state, not so much for + * data-structure synchronization. * * See the description of call_rcu() for more detailed information on * memory ordering guarantees. @@ -1287,8 +1281,9 @@ EXPORT_SYMBOL_GPL(call_rcu_tasks); * grace period has elapsed, in other words after all currently * executing rcu-tasks read-side critical sections have elapsed. These * read-side critical sections are delimited by calls to schedule(), - * cond_resched_tasks_rcu_qs(), idle execution, userspace execution, calls - * to synchronize_rcu_tasks(), and (in theory, anyway) cond_resched(). + * cond_resched_tasks_rcu_qs(), idle execution within RCU-idle context, + * userspace execution, calls to synchronize_rcu_tasks(), and (in theory, + * anyway) cond_resched(). * * This is a very specialized primitive, intended only for a few uses in * tracing and other situations requiring manipulation of function -- 2.40.1