On Sat, Jan 18, 2020 at 03:12:08PM -0500, Joel Fernandes wrote: > On Fri, Jan 17, 2020 at 08:28:26PM -0800, Paul E. McKenney wrote: > > On Fri, Jan 17, 2020 at 09:10:49PM -0500, Joel Fernandes wrote: > > > On Fri, Jan 17, 2020 at 03:17:56PM -0800, Paul E. McKenney wrote: > > > > On Fri, Jan 17, 2020 at 05:18:04PM -0500, Joel Fernandes wrote: > > > > > On Fri, Jan 17, 2020 at 04:58:14PM -0500, Joel Fernandes wrote: > > > > > > Hi, > > > > > > Me and Daniel were poking around with RCU_BOOST. I wrote a kernel module to > > > > > > test it a bit and I don't see the boost happening (thanks to Daniel for idea > > > > > > of writing a module). Haven't debugged it more yet. Will look more tomorrow. > > > > > > But below is the kernel module code and it prints a FAIL message to kernel > > > > > > logs in a few seconds. > > > > > > > > > > > > I see the reader thread not getting CPU for several seconds. RCU_BOOST_DELAY > > > > > > is set to 500. > > > > > > > > > > > > Thoughts? > > > > > > > > > > So this could be because I did not start a grace period which is quite silly. > > > > > I am sorry about that. I will add another thread to start grace periods as > > > > > well and let you know if I still see a problem. > > > > > > > > In addition, the RCU_READER_DELAY isn't long enough to trigger RCU priority > > > > boosting. And RCU_BLOCKER_DELAY would be, except that I am not seeing > > > > an RCU read-side critical section surrounding it. > > > > > > I was assuming that only the thread being preempted needs an RCU read-side > > > critical section. That preempted section would inherit the priority of the > > > thread preempting it (the blocking thread). So the blocking thread would not > > > need a read-side critical section, it just would need to be higher priority > > > than the thread it is preempting and preempt it long enough to trigger the > > > boosting. > > > > > > Did I miss something? > > > > Yes. That is not how RCU priority boosting works. > > > > What happens instead is that a set of rcub kthreads (one per leaf > > rcu_node structure) run at the SCHED_FIFO priority specified by the > > rcutree.kthread_prio kernel-boot parameter (as does the grace-period > > kthread when RCU priority boosting is enabled). When the grace-period > > kthread decides that boosting is needed, it awakens the relevant rcub > > kthread. The rcub kthread initializes an rt_mutex into a state where > > it appears to be held by the task that has been too long in an RCU > > read-side critical section, then acquires the lock. The lock-based > > priority-boosting mechanism kicks in at that point. (I heard this > > approach from tglx.) > > Thanks for the details. It makes sense to me. Glad it helped! > > And I missed something as well, namely that everything is bound to the > > same CPU in your test. But did you remember to set rcutree.kthread_prio > > to greater than 60? It defaults to 1 when RCU priority boosting is > > enabled, which isn't going to do much to a prio-50 SCHED_FIFO task, > > let alone a prio-60 SCHED_FIFO task. > > Yes indeed this was the issue. I set rcutree.kthread_prio to 90 and see the > test passing now. Below is the updated test code for archival. Very good! > But wouldn't this be an issue unless rcu.kthread_prio is set the MAX_RT_PRIO > or some high number? Because otherwise there could always be threads that > don't get boosted. It depends on the details of the real-time application and system. Some systems might need rcu.kthread_prio to be set to (say) 57 so that it didn't interfere with critical userspace tasks running at 58 and above. So why default to 1 for CONFIG_RCU_BOOST in mainline? Because the main usecase there is a system with a massive overload of SCHED_NORMAL tasks. In this case, SCHED_FIFO priority 1 suffices and minimally interferes with any realtime workload that might be running. > > > > But rcutorture already has tests for RCU priority boosting. Or are those > > > > failing in some way? > > > > > > > > Either way, it is good to see interest in RCU priority boosting. ;-) > > > > > > The interest is purely academic and curiousity-driven at this point ;-). > > > > Fair enough! > > > > > Speaking of which why is the config not default-enabled and would it not be a > > > good thing to enable everywhere or there some who dislike it? > > > > It is enabled by default in the -rt kernel. It has been some time since > > I proposed enabling it by default in mainline, but the last time that > > proposal was not well received. My approach is to wait until it becomes > > a problem in mainline, and if it does, propose making it the default in > > mainline. > > It could also be some problem that no one has root caused to a lack of > boosting? ;-) Quite possibly. But please understand that setting rcutree.kthread_prio even to 1 significantly increasees context-switch overhead, which is why it defaults to zero for CONFIG_RCU_BOOST=n and is also another reason why CONFIG_RCU_BOOST defaults to n. > > Except that I haven't heard of any such problems, so I have made no > > such proposal. > > Makes sense. > > > > Another thought is for RCU_BOOST systems to reduce the threshold of > > > triggering boost dynamically if the system is at the risk of running out of > > > memory. > > > > Agreed, and that is in fact the purpose of the check of rcu_state.cbovld > > in rcu_initiate_boost(). And why we need to get the number of > > outstanding kfree_rcu() blocks fed into the computation leading up to > > rcu_state.cbovld. ;-) > > Oh, I see this new code. I will study it and look into it more. Agreed on the > feeding of kfree_rcu() blocks and/or memory pressure information into the > computations leading up to cbovld. Very good! Thanx, Paul > thanks, > > - Joel > > ---8<----------------------- > > From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx> > Subject: [PATCH] Kernel module to test RCU_BOOST > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > --- > drivers/misc/Makefile | 2 +- > drivers/misc/ptest.c | 141 ++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 142 insertions(+), 1 deletion(-) > create mode 100644 drivers/misc/ptest.c > > diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile > index c1860d35dc7e..ba34957dff26 100644 > --- a/drivers/misc/Makefile > +++ b/drivers/misc/Makefile > @@ -2,7 +2,7 @@ > # > # Makefile for misc devices that really don't fit anywhere else. > # > - > +obj-m += ptest.o > obj-$(CONFIG_IBM_ASM) += ibmasm/ > obj-$(CONFIG_IBMVMC) += ibmvmc.o > obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o > diff --git a/drivers/misc/ptest.c b/drivers/misc/ptest.c > new file mode 100644 > index 000000000000..e5ece58e45ea > --- /dev/null > +++ b/drivers/misc/ptest.c > @@ -0,0 +1,141 @@ > +#include <linux/init.h> > +#include <linux/module.h> > +#include <linux/kernel.h> > +#include <linux/vmalloc.h> > +#include <linux/kthread.h> > +#include <linux/delay.h> > + > +#define RCU_READER_DELAY 100 //ms > +#define RCU_BLOCKER_DELAY 800 //ms > + > +/* Max delta in rcu_reader beyond which we can > + * say boosting failed. CONFIG_RCU_BOOST=200 in my setup plus booster wakes up > + * after 50ms and writer wakes up 150 ms after that So the GP starts only 150ms > + * later. To be safe, give a max of 400ms for the reader-section. > + */ > +#define RCU_READER_MAX_DELTA 400 // ms > + > +MODULE_LICENSE("GPL"); > + > +struct sched_param { > + int sched_priority; > +}; > + > +int stop_test = 0; > +int test_pass = 1; > +int reader_exit = 0; > +s64 delta_fail; > + > +#define ns_to_ms(delta) (delta / 1000000ULL) > + > +static int rcu_writer(void *a) > +{ > + while (!kthread_should_stop() && !stop_test) { > + trace_printk("starting gp\n"); > + synchronize_rcu(); > + trace_printk("ending gp\n"); > + msleep(10); > + } > + > + return 0; > +} > + > +static int rcu_reader(void *a) > +{ > + ktime_t start, end, reader_begin; > + s64 delta; > + > + reader_begin = ktime_get(); > + > + while (!kthread_should_stop() && !stop_test) { > + rcu_read_lock(); > + trace_printk("rcu_reader entering RSCS\n"); > + start = ktime_get(); > + mdelay(RCU_READER_DELAY); > + end = ktime_get(); > + trace_printk("rcu_reader exiting RSCS\n"); > + rcu_read_lock(); > + delta = ktime_to_ns(ktime_sub(end, start)); > + > + if (delta < 0 || (ns_to_ms(delta) > RCU_READER_MAX_DELTA)) { > + delta_fail = delta; > + test_pass = 0; > + break; > + } > + > + // Don't let the rcu_reader() run more than 3s inorder to > + // not starve the blocker incase reader prio > blocker prio. > + delta = ktime_to_ns(ktime_sub(end, reader_begin)); > + if (ns_to_ms(delta) > 3000) > + break; > + } > + > + stop_test = 1; > + reader_exit = 1; > + pr_err("Exiting reader\n"); > + return 0; > +} > + > +static int rcu_blocker(void *a) > +{ > + int loops = 5; > + > + while (!kthread_should_stop() && loops-- && !stop_test) { > + trace_printk("rcu_blocker entering\n"); > + mdelay(RCU_BLOCKER_DELAY); > + trace_printk("rcu_blocker exiting\n"); > + } > + > + pr_err("Exiting blocker\n"); > + stop_test = 1; > + > + // Wait for reader to finish > + while (!reader_exit) > + schedule_timeout_uninterruptible(1); > + > + if (test_pass) > + pr_err("TEST PASSED\n"); > + else > + pr_err("TEST FAILED, failing delta=%lldms\n", ns_to_ms(delta_fail)); > + > + return 0; > +} > + > +static int __init ptest_init(void){ > + struct sched_param params; > + struct task_struct *reader, *blocker, *writer; > + > + reader = kthread_create(rcu_reader, NULL, "reader"); > + params.sched_priority = 50; > + sched_setscheduler(reader, SCHED_FIFO, ¶ms); > + kthread_bind(reader, smp_processor_id()); > + > + blocker = kthread_create(rcu_blocker, NULL, "blocker"); > + params.sched_priority = 60; > + sched_setscheduler(blocker, SCHED_FIFO, ¶ms); > + kthread_bind(blocker, smp_processor_id()); > + > + writer = kthread_create(rcu_writer, NULL, "writer"); > + params.sched_priority = 70; > + sched_setscheduler(writer, SCHED_FIFO, ¶ms); > + kthread_bind(writer, smp_processor_id()); > + > + wake_up_process(reader); > + > + // Let reader run a little > + mdelay(50); > + > + wake_up_process(blocker); > + > + // Let blocker run a little > + mdelay(100); > + > + wake_up_process(writer); > + return 0; > +} > + > +static void __exit ptest_exit(void){ > +} > + > +module_init(ptest_init); > +module_exit(ptest_exit); > -- > 2.25.0.341.g760bfbb309-goog >