On Fri, Apr 08, 2022 at 02:23:34PM -0400, Joel Fernandes wrote: > On Fri, Apr 8, 2022 at 2:22 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > On Fri, Apr 8, 2022 at 1:49 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > On Fri, Apr 08, 2022 at 01:20:02PM -0400, Joel Fernandes wrote: > > > > On Fri, Apr 8, 2022 at 11:50 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > > > > > On Fri, Apr 08, 2022 at 10:52:21AM -0400, Joel Fernandes wrote: > > > > > > On Fri, Apr 8, 2022 at 10:22 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Thu, Apr 07, 2022 at 09:07:33PM +0000, Joel Fernandes wrote: > > > > > > > > On systems with CONFIG_RCU_NOCB_CPU=y, there is no default mask provided > > > > > > > > which ends up not offloading any CPU. This patch removes yet another > > > > > > > > dependency from the bootloader having to know about RCU, about how many > > > > > > > > CPUs the system has, and about how to provide the mask. Basically, I > > > > > > > > think we should stop pretending that the user knows what they are doing :). > > > > > > > > In other words, if NO_CB_CPU is enabled, lets make use of it. > > > > > > > > > > > > > > > > My goal is to make RCU as zero-config as possible with sane defaults. If > > > > > > > > user wants to provide rcu_nocbs= or nohz_full= options, then those will > > > > > > > > take precedence and this patch will have no effect. > > > > > > > > > > > > > > > > I tested providing rcu_nocbs= option, ensuring that is preferred over this. > > > > > > > > > > > > > > Unless something has changed, this would change behavior relied upon > > > > > > > the enterprise distros. Last I checked, they want to supply a single > > > > > > > binary, as evidenced by the recent CONFIG_PREEMPT_DYNAMIC Kconfig option, > > > > > > > and they also want the default to be non-offloaded. That is, given a > > > > > > > kernel built with CONFIG_RCU_NOCB_CPU=y and without either a nohz_full > > > > > > > or a nocbs_cpu boot parameter, all of the CPUs must be non-offloaded. > > > > > > > > > > > > Just curious, do you have information (like data, experiment results) > > > > > > on why they want default non-offloaded? Or maybe they haven't tried > > > > > > the recent work done in NOCB code? > > > > > > > > > > I most definitely do. When I first introduced callback offloading, I > > > > > made it completely replace softirq callback invocation. There were some > > > > > important throughput-oriented workloads that got hit with significant > > > > > performance degradation due to this change. Enterprise Java workloads > > > > > were the worst hit. > > > > > > > > > > Android does not run these workloads, and I am not aware of ChromeOS > > > > > running them, either. > > > > > > > > Thanks a lot for mentioning this, I was not aware and will make note > > > > of it :-). I wonder if the scheduler had something to do with the > > > > degradation. > > > > > > It is all too easy to blame the scheduler and all too easy to forget > > > that the scheduler has a hard job. ;-) > > > > > > And in this case, the scheduler was just doing what it was told. > > > > No was just saying the scheduler has to do more work with NOCB because > > of the extra threads, so that likely degrades the workloads (context > > switch, wake ups, etc). > > > > > > > > > And is it really all -that- hard to specify an additional boot parameter > > > > > > > across ChromeOS devices? Android seems to manage it. ;-) > > > > > > > > > > > > That's not the hard part I think. The hard part is to make sure a > > > > > > future Linux user who is not an RCU expert does not forget to turn it > > > > > > on. ChromeOS is not the only OS that I've seen someone forget to do it > > > > > > ;-D. AFAIR, there were Android devices too in the past where I saw > > > > > > this forgotten. I don't think we should rely on the users doing the > > > > > > right thing (as much as possible). > > > > > > > > > > > > The single kernel binary point makes sense but in this case, I think > > > > > > the bigger question that I'd have is what is the default behavior and > > > > > > what do *most* users of RCU want. So we can keep sane defaults for the > > > > > > majority and reduce human errors related to configuration. > > > > > > > > > > If both the ChromeOS and Android guys need it, I could reinstate the > > > > > old RCU_NOCB_CPU_ALL Kconfig option. This was removed due to complaints > > > > > about RCU Kconfig complexity, but I believe that Reviewed-by from ChromeOS > > > > > and Android movers and shakers would overcome lingering objections. > > > > > > > > > > Would that help? > > > > > > > > Yes, I think I would love for such a change. I am planning to add a > > > > test to ChromeOS to check whether config options were correctly set > > > > up. So I can test for both the RCU_NOCB_CPU options. > > > > > > Very good! > > > > > > Do you love such a change enough to create the patch and to collect > > > convincing Reviewed-by tags? > > > > Yes sure, just so I understand - basically I have to make the code in > > my patch run when RCU_NOCB_CPU_ALL option is passed (and keep the > > option default disabled), but otherwise default to the current > > behavior, right? > > Sorry rephrasing, "make the code in my patch run when the new > CONFIG_RCU_NOCB_CPU_ALL is enabled". Here is what I believe you are proposing: --- rcu_nocbs rcu_nocbs=??? CONFIG_RCU_NOCB_CPU_ALL=n [1] [2] [3] CONFIG_RCU_NOCB_CPU_ALL=y [4] [4] [3] [1] No CPUs are offloaded at boot. CPUs cannot be offloaded at runtime. [2] No CPUs are offloaded at boot, but any CPU can be offloaded (and later de-offloaded) at runtime. [3] The set of CPUs that are offloaded at boot are specified by the mask, represented above with "???". The CPUs that are offloaded at boot can be de-offloaded and offloaded at runtime. The CPUs not offloaded at boot cannot be offloaded at runtime. [4] All CPUs are offloaded at boot, and any CPU can be de-offloaded and offloaded at runtime. This is the same behavior that you would currently get with CONFIG_RCU_NOCB_CPU_ALL=n and rcu_nocbs=0-N. I am adding Frederic on CC, who will not be shy about correcting any confusion I be suffering from have with respect to the current code. Either way, if this is not what you had in mind, what are you suggesting instead? I believe that Steve Rostedt's review would carry weight for ChromeOS, however, I am suffering a senior moment on the right person for Android. Thanx, Paul