----- On Mar 18, 2020, at 8:10 PM, paulmck paulmck@xxxxxxxxxx wrote: > Hello! Hi Paul, Thanks for pulling this together! Some comments below (based only on the cover message), [...] > There are of course downsides. The grace-period code can send IPIs to > CPUs, even when those CPUs are in the idle loop or in nohz_full userspace. > However, this version enlists the aid of the context-switch hooks, > which eliminates the need for IPIs in context-switch-heavy workloads. > It also prohibits sending of IPIs early in the grace period, which > provides additional opportunity for the hooks to do their job. Additional > IPI-reduction mechanisms are under development. I suspect that on nohz_full cpus, at least some use-cases which really care about not receiving IPIs will not be doing that many context switches. What are the possible approaches to have IPI-*elimination* for nohz cpus ? > > The RCU tasks trace mechanism is based off of RCU tasks rather than > SRCU because the latter is more complex and also because the latter > uses a CPU-by-CPU approach to tracking quiescent states instead of the > task-by-task approach that is needed. It is in theory possible to > mash RCU tasks trace into the Tree SRCU implementation, but there > will need to be extremely good reasons for doing so. I have a hard time buying the "less complexity" argument to justify the introduction of yet another flavor of RCU when a close match already exists (SRCU). The other argument for this task-based RCU (rather than CPU-by-CPU as done by SRCU) is that "a task-by-task approach is needed". What I do not get from this explanation is why is such an approach needed ? Also, another aspect worth discussing here is the use-cases which need to be covered by tracing-rcu. Is this specific flavor targeting specifically preempt-off use-cases, or is the goal here to target use-cases which may trigger major page faults within the read-side critical section as well ? Note that doing task-by-task tracking of tracing-rcu rather than cpu-by-cpu is not free: AFAIU it bloats the task struct (always) for a use-case which is not always active. My experience with tracepoints and asm gotos is that we need to be careful not to slow down the common case (kernel running without any tracing active, but tracing configured in) if we want to keep distributions and end users building kernels with introspection facilities in place. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com