On Fri, Aug 27, 2021 at 01:43:35PM +0530, Neeraj Upadhyay wrote: > The trc_wait_for_one_reader() function is called at multiple stages > of trace rcu-tasks GP function, rcu_tasks_wait_gp(): > > - First, it is called as part of per task function - > rcu_tasks_trace_pertask(), for all non-idle tasks. As part of per task > processing, this function add the task in the holdout list and if the > task is currently running on a CPU, it sends IPI to the task's CPU. > The IPI handler takes action depending on whether task is in trace > rcu-tasks read side critical section or not: > > - a. If the task is in trace rcu-tasks read side critical section > (t->trc_reader_nesting != 0), the IPI handler sets the task's > ->trc_reader_special.b.need_qs, so that this task notifies exit > from its outermost read side critical section (by decrementing > trc_n_readers_need_end) to the GP handling function. > trc_wait_for_one_reader() also increments trc_n_readers_need_end, > so that the trace rcu-tasks GP handler function waits for this > task's read side exit notification. The IPI handler also sets > t->trc_reader_checked to true, and no further IPIs are sent for > this task, for this trace rcu-tasks grace period and this > task can be removed from holdout list. > > - b. If the task is in the process of exiting its trace rcu-tasks > read side critical section, (t->trc_reader_nesting < 0), defer > this task's processing to future calls to trc_wait_for_one_reader(). > > - c. If task is not in rcu-task read side critical section, > t->trc_reader_nesting == 0, ->trc_reader_checked is set for this > task, so that, this task is removed from holdout list. > > - Second, trc_wait_for_one_reader() is called as part of post scan, in > function rcu_tasks_trace_postscan(), for all idle tasks. > > - Third, in function check_all_holdout_tasks_trace(), for each task in > the holdout list, this function is called, if there isn't a pending IPI > for the task (->trc_ipi_to_cpu == -1). This function removed the task > from holdout list, if IPI handler has completed the required work, > to ensure that the current trace rcu-tasks grace period either waits > for this task, or this task is not in a trace rcu-tasks read side > critical section. > > Now, considering the scenario where smp_call_function_single() fails in > first case, inside rcu_tasks_trace_pertask(). In this case, > ->trc_ipi_to_cpu is set to the current CPU for that task. This will > result in trc_wait_for_one_reader() getting skipped in third case, > inside check_all_holdout_tasks_trace(), for this task. This further > results in ->trc_reader_checked never getting set for this task, > and the task not getting removed from holdout list. This can cause > the current trace rcu-tasks grace period to stall. > > Fix the above problem, by resetting ->trc_ipi_to_cpu to -1, on > smp_call_function_single() failure, so that future IPI calls can > be send for this task. As all three callers (rcu_tasks_trace_pertask(), > rcu_tasks_trace_postscan(), check_all_holdout_tasks_trace()) > of trc_wait_for_one_reader() run with CPU read lock held, > smp_call_function_single() cannot race with CPU hotplug. So, > also add a warning, to report and identify any such failure. > > Signed-off-by: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx> > --- > > Ran rcutorture TRACE01 tree for 10 hrs with below commandline, didn't > observe warning: > > Command line: debug_boot_weak_hash panic=-1 selinux=0 initcall_debug debug > console=ttyS0 rcutree.rcu_fanout_leaf=4 nohz_full=1-N rcutorture.onoff_interval=1000 > rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 > rcutorture.shutdown_secs=36000 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1 > rcutorture.torture_type=tasks-tracing Very good, thank you! I queued this for v5.16 (not the upcoming merge window, but the one after that) and for review and further testing. I did a little wordsmithing, so could you please check to make sure that I didn't mess something up? Thanx, Paul ------------------------------------------------------------------------ commit 2717b14fb5fc57e5f131cc0bac26c85fc03a0cd2 Author: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx> Date: Fri Aug 27 13:43:35 2021 +0530 rcu-tasks: Fix IPI failure handling in trc_wait_for_one_reader The trc_wait_for_one_reader() function is called at multiple stages of trace rcu-tasks GP function, rcu_tasks_wait_gp(): - First, it is called as part of per task function - rcu_tasks_trace_pertask(), for all non-idle tasks. As part of per task processing, this function add the task in the holdout list and if the task is currently running on a CPU, it sends IPI to the task's CPU. The IPI handler takes action depending on whether task is in trace rcu-tasks read side critical section or not: - a. If the task is in trace rcu-tasks read side critical section (t->trc_reader_nesting != 0), the IPI handler sets the task's ->trc_reader_special.b.need_qs, so that this task notifies exit from its outermost read side critical section (by decrementing trc_n_readers_need_end) to the GP handling function. trc_wait_for_one_reader() also increments trc_n_readers_need_end, so that the trace rcu-tasks GP handler function waits for this task's read side exit notification. The IPI handler also sets t->trc_reader_checked to true, and no further IPIs are sent for this task, for this trace rcu-tasks grace period and this task can be removed from holdout list. - b. If the task is in the process of exiting its trace rcu-tasks read side critical section, (t->trc_reader_nesting < 0), defer this task's processing to future calls to trc_wait_for_one_reader(). - c. If task is not in rcu-task read side critical section, t->trc_reader_nesting == 0, ->trc_reader_checked is set for this task, so that this task is removed from holdout list. - Second, trc_wait_for_one_reader() is called as part of post scan, in function rcu_tasks_trace_postscan(), for all idle tasks. - Third, in function check_all_holdout_tasks_trace(), this function is called for each task in the holdout list, but only if there isn't a pending IPI for the task (->trc_ipi_to_cpu == -1). This function removed the task from holdout list, if IPI handler has completed the required work, to ensure that the current trace rcu-tasks grace period either waits for this task, or this task is not in a trace rcu-tasks read side critical section. Now, considering the scenario where smp_call_function_single() fails in first case, inside rcu_tasks_trace_pertask(). In this case, ->trc_ipi_to_cpu is set to the current CPU for that task. This will result in trc_wait_for_one_reader() getting skipped in third case, inside check_all_holdout_tasks_trace(), for this task. This further results in ->trc_reader_checked never getting set for this task, and the task not getting removed from holdout list. This can cause the current trace rcu-tasks grace period to stall. Fix the above problem, by resetting ->trc_ipi_to_cpu to -1, on smp_call_function_single() failure, so that future IPI calls can be send for this task. Note that all three of the trc_wait_for_one_reader() function's callers (rcu_tasks_trace_pertask(), rcu_tasks_trace_postscan(), check_all_holdout_tasks_trace()) hold cpu_read_lock(). This means that smp_call_function_single() cannot race with CPU hotplug, and thus should never fail. Therefore, also add a warning in order to report any such failure in case smp_call_function_single() grows some other reason for failure. Signed-off-by: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index e6f0def1cb60..c110fd080c5b 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -1014,9 +1014,11 @@ static void trc_wait_for_one_reader(struct task_struct *t, if (smp_call_function_single(cpu, trc_read_check_handler, t, 0)) { // Just in case there is some other reason for // failure than the target CPU being offline. + WARN_ONCE(1, "%s(): smp_call_function_single() failed for CPU: %d\n", + __func__, cpu); rcu_tasks_trace.n_ipis_fails++; per_cpu(trc_ipi_to_cpu, cpu) = false; - t->trc_ipi_to_cpu = cpu; + t->trc_ipi_to_cpu = -1; } } }