Re: Improving RCU wait in mini_qdisc_pair_swap()

Seth Forshee <seth@xxxxxxxxxx> · Fri, 22 Oct 2021 08:17:59 -0500

On Thu, Oct 21, 2021 at 03:24:35PM -0700, Paul E. McKenney wrote:
> > > > > So you can let the heavy real-time workloads
> > > > > have slowish Qdisc switching and IPI the IPI-insensitive workloads:
> > > > > 
> > > > > 	oldstate = start_poll_synchronize_rcu();
> > > > > 
> > > > > 	// do something.
> > > > > 
> > > > > 	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > > > > 		cond_synchronize_rcu(oldstate);
> > > > > 	else if (!poll_state_synchronize_rcu(oldstate))
> > > > > 		synchronize_rcu_expedited();
> > > > 
> > > > I was concerned about the system-wide impact synchronize_rcu_expedited()
> > > > has, but this at least avoids doing it if it's unnecessary. If it's bad
> > > > form to leave mini_Qdisc.rcu_state initialized to 0, this option is
> > > > preferable from my perspective.
> > > 
> > > This might well be the best approach if multiple swaps are at all common.
> > > 
> > > But if needed, we really could do both:  Make initial swaps really fast,
> > > and improve the performance of later swaps.
> > 
> > Using this the time for mini_qdisc_pair_swap() to complete was similar
> > to my earlier patch with mini_Qdisc.rcu_state initialized to 0. So this
> > seems acceptible for my purposes.
> 
> Not sure which approach "this" refers to.  For the moment, I am assuming
> the last code fragment shown above.

Yes, that is correct. Apologies that I wasn't clear.

> > > > > Yet another approach is this:
> > > > > 
> > > > > 	oldstate = start_poll_synchronize_rcu();
> > > > > 
> > > > > 	// do something.
> > > > > 
> > > > > 	while (!poll_state_synchronize_rcu(oldstate))
> > > > > 		schedule_timeout_idle(1);
> > > > > 
> > > > > The advantage of this one is that you get credit for partial progress
> > > > > through grace periods, but it won't be anywhere near as fast as
> > > > > synchronize_rcu_expedited().
> > > > 
> > > > If we need to initialize mini_Qdisc.rcu_state up front this approach
> > > > probably won't improve things much for my problem.
> > > 
> > > It all depends on how long between swaps.  ;-)
> > 
> > This approach helps, but I still see about 50% of the
> > mini_qdisc_pair_swap() calls taking tens of milliseconds. This is
> > similar to what I saw using cond_synchronize_rcu() with rcu_state
> > initialized from get_state_synchronize_rcu().
> 
> OK, which means that the swaps are too closely spaced to get much
> benefit about 50% of the time.

Yes, or in my case it's the time between mini_qdisc_pair_init() and the
first swap, which is why initializing rcu_state to 0 helped before.

> > Wouldn't this also behave very poorly in the wrapping case you mentioned
> > above?
> 
> I confess, I have no idea what case is the "wrapping case".

This question was based on a misunderstanding on my part, you can ignore
it.

> But any implementation that needs to wait for a full normal
> (non-expedited) grace period will suffer at least 50% of the time
> when running your workload.  And of the above code fragments, only the
> synchronize_rcu_expedited() variants avoid waiting for a full normal
> grace period.
> 
> > In the end, I'm happy to move forward using synchronize_rcu_expedited().
> > I'm going to test a bit more, and I should have a patch out sometime
> > tomorrow.
> 
> Works for me!  At least given the CONFIG_PREEMPT_RT check.

Yes, I included that check.

Thanks,
Seth