Re: kernel-rt rcuc lock contention problem

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Tue, 27 Jan 2015 12:37:52 -0800

On Mon, Jan 26, 2015 at 02:14:03PM -0500, Luiz Capitulino wrote:
> Paul,
> 
> We're running some measurements with cyclictest running inside a
> KVM guest where we could observe spinlock contention among rcuc
> threads.
> 
> Basically, we have a 16-CPU NUMA machine very well setup for RT.
> This machine and the guest run the RT kernel. As our test-case
> requires an application in the guest taking 100% of the CPU, the
> RT priority configuration that gives the best latency is this one:
> 
>  263  FF   3  [rcuc/15]
>   13  FF   3  [rcub/1]
>   12  FF   3  [rcub/0]
>  265  FF   2  [ksoftirqd/15]
> 3181  FF   1  qemu-kvm
> 
> In this configuration, the rcuc can preempt the guest's vcpu
> thread. This shouldn't be a problem, except for the fact that
> we're seeing that in some cases the rcuc/15 thread spends 10us
> or more spinning in this spinlock (note that IRQs are disabled
> during this period):
> 
> __rcu_process_callbacks()
> {
> ...
> 	local_irq_save(flags);
> 	if (cpu_needs_another_gp(rsp, rdp)) {
> 		raw_spin_lock(&rcu_get_root(rsp)->lock); /* irqs disabled. */
> 		rcu_start_gp(rsp);
> 		raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags);
> ...

Life can be hard when irq-disabled spinlocks can be preempted!  But how
often does this happen?  Also, does this happen on smaller systems, for
example, with four or eight CPUs?  And I confess to be a bit surprised
that you expect real-time response from a guest that is subject to
preemption -- as I understand it, the usual approach is to give RT guests
their own CPUs.

Or am I missing something?

> We've tried playing with the rcu_nocbs= option. However, it
> did not help because, for reasons we don't understand, the rcuc
> threads have to handle grace period start even when callback
> offloading is used. Handling this case requires this code path
> to be executed.

Yep.  The rcu_nocbs= option offloads invocation of RCU callbacks, but not
the per-CPU work required to inform RCU of quiescent states.

> We've cooked the following extremely dirty patch, just to see
> what would happen:
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index eaed1ef..c0771cc 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2298,9 +2298,19 @@ __rcu_process_callbacks(struct rcu_state *rsp)
>  	/* Does this CPU require a not-yet-started grace period? */
>  	local_irq_save(flags);
>  	if (cpu_needs_another_gp(rsp, rdp)) {
> -		raw_spin_lock(&rcu_get_root(rsp)->lock); /* irqs disabled. */
> -		rcu_start_gp(rsp);
> -		raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags);
> +		for (;;) {
> +			if (!raw_spin_trylock(&rcu_get_root(rsp)->lock)) {
> +				local_irq_restore(flags);
> +				local_bh_enable();
> +				schedule_timeout_interruptible(2);

Yes, the above will get you a splat in mainline kernels, which do not
necessarily push softirq processing to the ksoftirqd kthreads.  ;-)

> +				local_bh_disable();
> +				local_irq_save(flags);
> +				continue;
> +			}
> +			rcu_start_gp(rsp);
> +			raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags);
> +			break;
> +		}
>  	} else {
>  		local_irq_restore(flags);
>  	}
> 
> With this patch rcuc is gone from our traces and the scheduling
> latency is reduced by 3us in our CPU-bound test-case.
> 
> Could you please advice on how to solve this contention problem?

The usual advice would be to configure the system such that the guest's
VCPUs do not get preempted.  Or is the contention on the root rcu_node
structure's ->lock field high for some other reason?

> Can we test whether the local CPU is nocb, and in that case, 
> skip rcu_start_gp entirely for example?

If you do that, you can see system hangs due to needed grace periods never
getting started.

Are you using the default value of 16 for CONFIG_RCU_FANOUT_LEAF?
If you are using a smaller value, it would be possible to rework the
code to reduce contention on ->lock, though if a VCPU does get preempted
while holding the root rcu_node structure's ->lock, life will be hard.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html