On Sat, May 7, 2022 at 6:32 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Sat, May 07, 2022 at 11:11:58AM +0200, Uladzislau Rezki wrote: > > > On Fri, May 06, 2022 at 06:22:26PM +0200, Uladzislau Rezki wrote: > > > > > On Thu, May 05, 2022 at 12:16:41PM +0200, Uladzislau Rezki (Sony) wrote: > > > > > > Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can > > > > > > decide if an offloading has to be done in a high-prio context or > > > > > > not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST > > > > > > parameters and by default it is off. > > > > > > > > > > > > This patch splits the boosting preempted RCU readers and those > > > > > > kthreads which directly responsible for driving expedited grace > > > > > > periods forward with enabling/disabling the offloading from/to > > > > > > SCHED_FIFO/SCHED_OTHER contexts. > > > > > > > > > > > > The main reason of such split is, for example on Android there > > > > > > are some workloads which require fast expedited grace period to > > > > > > be done whereas offloading in RT context can lead to starvation > > > > > > and hogging a CPU for a long time what is not acceptable for > > > > > > latency sensitive environment. For instance: > > > > > > > > > > > > <snip> > > > > > > <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270 > > > > > > <snip> > > > > > > > > > > > > invoking 34 619 callbacks will take time thus making other CFS > > > > > > tasks waiting in run-queue to be starved due to such behaviour. > > > > > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> > > > > > > > > > > All good points! > > > > > > > > > > Some questions and comments below. > > > > > > > > > > Adding Sebastian on CC for his perspective. > > > > > > > > > > Thanx, Paul > > > > > > > > > > > --- > > > > > > kernel/rcu/Kconfig | 14 ++++++++++++++ > > > > > > kernel/rcu/tree.c | 5 ++++- > > > > > > kernel/rcu/tree_nocb.h | 3 ++- > > > > > > 3 files changed, 20 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig > > > > > > index 27aab870ae4c..074630b94902 100644 > > > > > > --- a/kernel/rcu/Kconfig > > > > > > +++ b/kernel/rcu/Kconfig > > > > > > @@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL > > > > > > Say Y here if you want offload all CPUs by default on boot. > > > > > > Say N here if you are unsure. > > > > > > > > > > > > +config RCU_NOCB_CPU_CB_BOOST > > > > > > + bool "Perform offloading from real-time kthread" > > > > > > + depends on RCU_NOCB_CPU && RCU_BOOST > > > > > > + default n > > > > > > > > > > I understand that you need this to default to "n" on your systems. > > > > > However, other groups already using callback offloading should not see > > > > > a sudden change. I don't see an Android-specific defconfig file, but > > > > > perhaps something in drivers/android/Kconfig? > > > > > > > We saw a sudden change when the priority was lifted up for rcuop kthreads. > > I would like to know the reason. As for Android, i would like to avoid > > it to be Android specific. It is better just to enable boosting by > > default for nocb kthreads. > > No, because that breaks an existing use case, which uses RCU_BOOST > to avoid OOM on busy systems. > > > > > > One easy way to make this work would be to invert the sense of this > > > > > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to > > > > > "n", but then select it somewhere in drivers/android/Kconfig. But I > > > > > would not be surprised if there is a better way. > > > > In that situation probably we should just enable it by default. > > You are within your rights to cause it to be enabled by default -within- > -Android-. You are -not- within your rights to break other workloads. > > If ChromeOS needs it too, they too can enable it -within- -ChromeOS-. > > It is not -that- hard, guys! ;-) I think on the topic of RT, +Steven Rostedt should chime in as well considering he wrote a good chunk of the RT scheduler ;-). Personally, I feel the issue of "rcu callback offload" threads running as RT or not should not be a matter of CONFIG option or the system in concern. Instead it should be a function of how many callbacks there are to run. The reason I say this is, RT threads should not be doing a lot of work anyway, lest they cause RT throttling and starvation of other threads. Also, I think it is wrong to assume that a certain kind of system will always have a certain number of callbacks to process at a time. That seems prone to poor design due to assumptions which may not always be true. Can we not have 2 sets of RCU offload threads, one which operate at RT and only process few callbacks at a time, while another which is the lower priority CFS offload thread - executes whenever there is a lot of CBs pending? Just a thought. Otherwise, I feel like we might be again proliferating CONFIG options and increasing burden on the user to get it the CONFIG right. thanks, - Joel