Re: RCU_BOOST not working for me

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 18, 2020 at 03:12:08PM -0500, Joel Fernandes wrote:
> On Fri, Jan 17, 2020 at 08:28:26PM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 17, 2020 at 09:10:49PM -0500, Joel Fernandes wrote:
> > > On Fri, Jan 17, 2020 at 03:17:56PM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 17, 2020 at 05:18:04PM -0500, Joel Fernandes wrote:
> > > > > On Fri, Jan 17, 2020 at 04:58:14PM -0500, Joel Fernandes wrote:
> > > > > > Hi,
> > > > > > Me and Daniel were poking around with RCU_BOOST. I wrote a kernel module to
> > > > > > test it a bit and I don't see the boost happening (thanks to Daniel for idea
> > > > > > of writing a module). Haven't debugged it more yet. Will look more tomorrow.
> > > > > > But below is the kernel module code and it prints a FAIL message to kernel
> > > > > > logs in a few seconds.
> > > > > > 
> > > > > > I see the reader thread not getting CPU for several seconds. RCU_BOOST_DELAY
> > > > > > is set to 500.
> > > > > > 
> > > > > > Thoughts?
> > > > > 
> > > > > So this could be because I did not start a grace period which is quite silly.
> > > > > I am sorry about that. I will add another thread to start grace periods as
> > > > > well and let you know if I still see a problem.
> > > > 
> > > > In addition, the RCU_READER_DELAY isn't long enough to trigger RCU priority
> > > > boosting.  And RCU_BLOCKER_DELAY would be, except that I am not seeing
> > > > an RCU read-side critical section surrounding it.
> > > 
> > > I was assuming that only the thread being preempted needs an RCU read-side
> > > critical section. That preempted section would inherit the priority of the
> > > thread preempting it (the blocking thread). So the blocking thread would not
> > > need a read-side critical section, it just would need to be higher priority
> > > than the thread it is preempting and preempt it long enough to trigger the
> > > boosting.
> > > 
> > > Did I miss something?
> > 
> > Yes.  That is not how RCU priority boosting works.
> > 
> > What happens instead is that a set of rcub kthreads (one per leaf
> > rcu_node structure) run at the SCHED_FIFO priority specified by the
> > rcutree.kthread_prio kernel-boot parameter (as does the grace-period
> > kthread when RCU priority boosting is enabled).  When the grace-period
> > kthread decides that boosting is needed, it awakens the relevant rcub
> > kthread.  The rcub kthread initializes an rt_mutex into a state where
> > it appears to be held by the task that has been too long in an RCU
> > read-side critical section, then acquires the lock.  The lock-based
> > priority-boosting mechanism kicks in at that point.  (I heard this
> > approach from tglx.)
> 
> Thanks for the details. It makes sense to me.

Glad it helped!

> > And I missed something as well, namely that everything is bound to the
> > same CPU in your test.  But did you remember to set rcutree.kthread_prio
> > to greater than 60?  It defaults to 1 when RCU priority boosting is
> > enabled, which isn't going to do much to a prio-50 SCHED_FIFO task,
> > let alone a prio-60 SCHED_FIFO task.
> 
> Yes indeed this was the issue. I set rcutree.kthread_prio to 90 and see the
> test passing now. Below is the updated test code for archival.

Very good!

> But wouldn't this be an issue unless rcu.kthread_prio is set the MAX_RT_PRIO
> or some high number? Because otherwise there could always be threads that
> don't get boosted.

It depends on the details of the real-time application and system.

Some systems might need rcu.kthread_prio to be set to (say) 57 so that it
didn't interfere with critical userspace tasks running at 58 and above.
So why default to 1 for CONFIG_RCU_BOOST in mainline?  Because the main
usecase there is a system with a massive overload of SCHED_NORMAL tasks.
In this case, SCHED_FIFO priority 1 suffices and minimally interferes
with any realtime workload that might be running.

> > > > But rcutorture already has tests for RCU priority boosting.  Or are those
> > > > failing in some way?
> > > > 
> > > > Either way, it is good to see interest in RCU priority boosting.  ;-)
> > > 
> > > The interest is purely academic and curiousity-driven at this point ;-).
> > 
> > Fair enough!
> > 
> > > Speaking of which why is the config not default-enabled and would it not be a
> > > good thing to enable everywhere or there some who dislike it?
> > 
> > It is enabled by default in the -rt kernel.  It has been some time since
> > I proposed enabling it by default in mainline, but the last time that
> > proposal was not well received.  My approach is to wait until it becomes
> > a problem in mainline, and if it does, propose making it the default in
> > mainline.
> 
> It could also be some problem that no one has root caused to a lack of
> boosting?  ;-)

Quite possibly.  But please understand that setting rcutree.kthread_prio
even to 1 significantly increasees context-switch overhead, which is why
it defaults to zero for CONFIG_RCU_BOOST=n and is also another reason
why CONFIG_RCU_BOOST defaults to n.

> > Except that I haven't heard of any such problems, so I have made no
> > such proposal.
> 
> Makes sense.
> 
> > > Another thought is for RCU_BOOST systems to reduce the threshold of
> > > triggering boost dynamically if the system is at the risk of running out of
> > > memory.
> > 
> > Agreed, and that is in fact the purpose of the check of rcu_state.cbovld
> > in rcu_initiate_boost().  And why we need to get the number of
> > outstanding kfree_rcu() blocks fed into the computation leading up to
> > rcu_state.cbovld.  ;-)
> 
> Oh, I see this new code. I will study it and look into it more. Agreed on the
> feeding of kfree_rcu() blocks and/or memory pressure information into the
> computations leading up to cbovld.

Very good!

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> ---8<-----------------------
> 
> From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
> Subject: [PATCH] Kernel module to test RCU_BOOST
> 
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> ---
>  drivers/misc/Makefile |   2 +-
>  drivers/misc/ptest.c  | 141 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 142 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/misc/ptest.c
> 
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index c1860d35dc7e..ba34957dff26 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -2,7 +2,7 @@
>  #
>  # Makefile for misc devices that really don't fit anywhere else.
>  #
> -
> +obj-m += ptest.o
>  obj-$(CONFIG_IBM_ASM)		+= ibmasm/
>  obj-$(CONFIG_IBMVMC)		+= ibmvmc.o
>  obj-$(CONFIG_AD525X_DPOT)	+= ad525x_dpot.o
> diff --git a/drivers/misc/ptest.c b/drivers/misc/ptest.c
> new file mode 100644
> index 000000000000..e5ece58e45ea
> --- /dev/null
> +++ b/drivers/misc/ptest.c
> @@ -0,0 +1,141 @@
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/vmalloc.h>
> +#include <linux/kthread.h>
> +#include <linux/delay.h>
> +
> +#define RCU_READER_DELAY 100 //ms
> +#define RCU_BLOCKER_DELAY 800 //ms
> +
> +/* Max delta in rcu_reader beyond which we can
> + * say boosting failed.  CONFIG_RCU_BOOST=200 in my setup plus booster wakes up
> + * after 50ms and writer wakes up 150 ms after that So the GP starts only 150ms
> + * later. To be safe, give a max of 400ms for the reader-section.
> + */
> +#define RCU_READER_MAX_DELTA 400 // ms
> +
> +MODULE_LICENSE("GPL");
> +
> +struct sched_param {
> +	int sched_priority;
> +};
> +
> +int stop_test = 0;
> +int test_pass = 1;
> +int reader_exit = 0;
> +s64 delta_fail;
> +
> +#define ns_to_ms(delta) (delta / 1000000ULL)
> +
> +static int rcu_writer(void *a)
> +{
> +	while (!kthread_should_stop() && !stop_test) {
> +		trace_printk("starting gp\n");
> +		synchronize_rcu();
> +		trace_printk("ending gp\n");
> +		msleep(10);
> +	}
> +
> +	return 0;
> +}
> +
> +static int rcu_reader(void *a)
> +{
> +	ktime_t start, end, reader_begin;
> +	s64 delta;
> +
> +	reader_begin = ktime_get();
> +
> +	while (!kthread_should_stop() && !stop_test) {
> +		rcu_read_lock();
> +		trace_printk("rcu_reader entering RSCS\n");
> +		start = ktime_get();
> +		mdelay(RCU_READER_DELAY);
> +		end = ktime_get();
> +		trace_printk("rcu_reader exiting RSCS\n");
> +		rcu_read_lock();
> +		delta = ktime_to_ns(ktime_sub(end, start));
> +
> +		if (delta < 0 || (ns_to_ms(delta) > RCU_READER_MAX_DELTA)) {
> +			delta_fail = delta;
> +			test_pass = 0;
> +			break;
> +		}
> +
> +		// Don't let the rcu_reader() run more than 3s inorder to
> +		// not starve the blocker incase reader prio > blocker prio.
> +		delta = ktime_to_ns(ktime_sub(end, reader_begin));
> +		if (ns_to_ms(delta) > 3000)
> +			break;
> +	}
> +
> +	stop_test = 1;
> +	reader_exit = 1;
> +	pr_err("Exiting reader\n");
> +	return 0;
> +}
> +
> +static int rcu_blocker(void *a)
> +{
> +	int loops = 5;
> +
> +	while (!kthread_should_stop() && loops-- && !stop_test) {
> +		trace_printk("rcu_blocker entering\n");
> +		mdelay(RCU_BLOCKER_DELAY);
> +		trace_printk("rcu_blocker exiting\n");
> +	}
> +
> +	pr_err("Exiting blocker\n");
> +	stop_test = 1;
> +
> +	// Wait for reader to finish
> +	while (!reader_exit)
> +		schedule_timeout_uninterruptible(1);
> +
> +	if (test_pass)
> +		pr_err("TEST PASSED\n");
> +	else
> +		pr_err("TEST FAILED, failing delta=%lldms\n", ns_to_ms(delta_fail));
> +
> +	return 0;
> +}
> +
> +static int __init ptest_init(void){
> +	struct sched_param params;
> +	struct task_struct *reader, *blocker, *writer;
> +
> +	reader = kthread_create(rcu_reader, NULL, "reader");
> +	params.sched_priority = 50;
> +	sched_setscheduler(reader, SCHED_FIFO, &params);
> +	kthread_bind(reader, smp_processor_id());
> +
> +	blocker = kthread_create(rcu_blocker, NULL, "blocker");
> +	params.sched_priority = 60;
> +	sched_setscheduler(blocker, SCHED_FIFO, &params);
> +	kthread_bind(blocker, smp_processor_id());
> +
> +	writer = kthread_create(rcu_writer, NULL, "writer");
> +	params.sched_priority = 70;
> +	sched_setscheduler(writer, SCHED_FIFO, &params);
> +	kthread_bind(writer, smp_processor_id());
> +
> +	wake_up_process(reader);
> +
> +	// Let reader run a little
> +	mdelay(50);
> +
> +	wake_up_process(blocker);
> +
> +	// Let blocker run a little
> +	mdelay(100);
> +
> +	wake_up_process(writer);
> +	return 0;
> +}
> +
> +static void __exit ptest_exit(void){
> +}
> +
> +module_init(ptest_init);
> +module_exit(ptest_exit);
> -- 
> 2.25.0.341.g760bfbb309-goog
> 



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux