On Fri, Apr 8, 2022 at 11:34 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Fri, Apr 08, 2022 at 10:41:26AM -0400, Joel Fernandes wrote: > > On Fri, Apr 8, 2022 at 10:34 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > On Fri, Apr 08, 2022 at 06:42:42AM -0400, Joel Fernandes wrote: > > > > On Fri, Apr 8, 2022 at 12:57 AM Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote: > > > > > > > > > [...] > > > > > @@ -334,15 +334,13 @@ static bool exp_funnel_lock(unsigned long s) > > > > > * Select the CPUs within the specified rcu_node that the upcoming > > > > > * expedited grace period needs to wait for. > > > > > */ > > > > > -static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) > > > > > +static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp) > > > > > { > > > > > int cpu; > > > > > unsigned long flags; > > > > > unsigned long mask_ofl_test; > > > > > unsigned long mask_ofl_ipi; > > > > > int ret; > > > > > - struct rcu_exp_work *rewp = > > > > > - container_of(wp, struct rcu_exp_work, rew_work); > > > > > struct rcu_node *rnp = container_of(rewp, struct rcu_node, rew); > > > > > > > > > > raw_spin_lock_irqsave_rcu_node(rnp, flags); > > > > > @@ -417,13 +415,119 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) > > > > > rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false); > > > > > } > > > > > > > > > > +static void rcu_exp_sel_wait_wake(unsigned long s); > > > > > + > > > > > +#ifdef CONFIG_RCU_EXP_KTHREAD > > > > > > > > Just my 2c: > > > > > > > > Honestly, I am not sure if the benefits of duplicating the code to use > > > > normal workqueues outweighs the drawbacks (namely code complexity, > > > > code duplication - which can in turn cause more bugs and maintenance > > > > headaches down the line). The code is harder to read and adding more > > > > 30 character function names does not help. > > > > > > > > For something as important as expedited GPs, I can't imagine a > > > > scenario where an RT kthread worker would cause "issues". If it does > > > > cause issues, that's what the -rc cycles and the stable releases are > > > > for. I prefer to trust the process than take a one-foot-in-the-door > > > > approach. > > > > > > > > So please, can we just keep it simple? > > > > > > Yes and no. > > > > > > This is a bug fix, but only for those systems that are expecting real-time > > > response from synchronize_rcu_expedited(). As far as I know, this is only > > > Android. The rest of the systems are just fine with the current behavior. > > > > As far as you know, but are you sure? > > None of us are sure. We are balancing risks and potential benefits. Right. > > > In addition, this bug fix introduces significant risks, especially in > > > terms of performance for throughput-oriented workloads. > > > > Could you explain what the risk is? That's the part I did not follow. > > How can making synchronize_rcu_expedited() work getting priority > > introduce throughput issues? > > Status quo has synchronize_rcu_expedited() workqueues running as > SCHED_OTHER. Yeah, so if we go by this, you are saying RCU_BOOST likely does not work correctly on status quo right? I do not see what in the commit message indicates that this is an Android-only issue, let me know what I am missing. The users affected by this will instead have these running > as SCHED_FIFO. That changes scheduling. For users not explicitly > needing low-latency synchronize_rcu_expedited(), this change is very > unlikely to be for the better. And historically, unnecessarily running > portions of RCU at real-time priorities has been a change for the worse. > As in greatly increased context-switch rates and consequently degraded > performance. Please note that this is not a theoretical statement: Real > users have really been burned by too much SCHED_FIFO in RCU kthreads in > the past. Android also has suffered from too much SCHED_FIFO in the past. I remember the display thread called 'surfaceflinger' had to be dropped from RT privilege at one point. > > > So yes, let's do this bug fix (with appropriate adjustment), but let's > > > also avoid exposing the non-Android workloads to risks from the inevitable > > > unintended consequences. ;-) > > > > I would argue the risk is also adding code complexity and more bugs > > without clear rationale for why it is being done. There's always risk > > with any change, but that's the -rc cycles and stable kernels help > > catch those. I think we should not add more code complexity if it is a > > theoretical concern. > > > > There's also another possible risk - there is a possible hidden > > problem here that probably the non-Android folks haven't noticed or > > been able to debug. I would rather just do the right thing. > > > > Just my 2c, > > Sorry, but my answer is still "no". > > Your suggested change risks visiting unacceptable performance > degradation on a very large number of innocent users for whom current > synchronize_rcu_expedited() latency is plenty good enough. I believe the process will catch any such risk, but it is your call! ;-) Thanks, - Joel