On Wed, Apr 26, 2023 at 10:26:38AM -0700, Paul E. McKenney wrote: > The rcu_tasks_invoke_cbs() relies on queue_work_on() to silently fall > back to WORK_CPU_UNBOUND when the specified CPU is offline. However, > the queue_work_on() function's silent fallback mechanism relies on that > CPU having been online at some time in the past. When queue_work_on() > is passed a CPU that has never been online, workqueue lockups ensue, > which can be bad for your kernel's general health and well-being. > > This commit therefore checks whether a given CPU is currently online, > and, if not substitutes WORK_CPU_UNBOUND in the subsequent call to > queue_work_on(). Why not simply omit the queue_work_on() call entirely? > Because this function is flooding callback-invocation notifications > to all CPUs, and must deal with possibilities that include a sparse > cpu_possible_mask. > > Fixes: d363f833c6d88 rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations > Reported-by: Tejun Heo <tj@xxxxxxxxxx> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> ... > + // If a CPU has never been online, queue_work_on() > + // objects to queueing work on that CPU. Approximate a > + // check for this by checking if the CPU is currently online. > + > + cpus_read_lock(); > + cpuwq1 = cpu_online(cpunext) ? cpunext : WORK_CPU_UNBOUND; > + cpuwq2 = cpu_online(cpunext + 1) ? cpunext + 1 : WORK_CPU_UNBOUND; > + cpus_read_unlock(); > + > + // Yes, either CPU could go offline here. But that is > + // OK because queue_work_on() will (in effect) silently > + // fall back to WORK_CPU_UNBOUND for any CPU that has ever > + // been online. Looks like cpus_read_lock() isn't protecting anything really. > + queue_work_on(cpuwq1, system_wq, &rtpcp_next->rtp_work); > cpunext++; > if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) { > rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext); > - queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work); > + queue_work_on(cpuwq2, system_wq, &rtpcp_next->rtp_work); As discussed in the thread, I kinda wonder whether just using an unbound workqueue would be sufficient but as a fix this looks good to me. Acked-by: Tejun Heo <tj@xxxxxxxxxx> Thanks. -- tejun