Re: CPU trying to start a GP when no CBs were assigned new GP numbers

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Wed, 17 Jun 2020 07:01:00 -0700

On Wed, Jun 17, 2020 at 02:39:54AM -0400, Joel Fernandes wrote:
> On Tue, Jun 16, 2020 at 09:06:14PM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 16, 2020 at 08:24:37PM -0400, Joel Fernandes wrote:
> > > Hi,
> > > I am seeing something a bit strange with RCU where it is trying to
> > > start a GP twice from a CPU even though no new CB was queued on that
> > > CPU. It is quite possible that I'm missing something. Anyway, I wrote
> > > a patch to add some tracing when CBs are queued into the segcb. I am
> > > planning to post this trace patch later.
> > > 
> > > The trace in the link below shows CPU2 queuing around 5 CBs, which
> > > then gets accelerated at 5.192123. The GP thread running on CPU3
> > > starts a new GP. Now the CPU2 softirq runs again (roughly 1ms after
> > > the previous acceleration). The softirq runs probably because the GP
> > > thread is expecting a QS report from CPU 2.  When the CPU2's softirq
> > > runs though, it does an acceleration again which triggers a second new
> > > GP start. This seems a bit unnecessary AFAICS - because the need for
> > > GP *832 was already recorded which is all CPU2 should really be caring
> > > about right?
> > > 
> > > Here is the trace: https://pastebin.com/raw/AYGzu1g4
> > 
> > Assuming that the WAIT= and NEXT_READY= numbers are grace-period numbers,
> 
> In the trace there are 2 numbers for WAIT and NEXT_READY each, number of
> callbacks and gp numbers. Sorry, should have clarified that.

Ah, I see.  What are you doing to compute the number of callbacks?

> > this trace is expected behavior for two sets of callbacks, one that
> > arrived at CPU 2 by time 5.192121 and another that arrived between then
> > and time 5.193131.
> 
> There is just 1 set of callbacks.
> > 
> > So I have to ask...  What tells you that no callbacks arrived at CPU 2
> > during this interval?
> 
> Because there is no rcu_callback tracepoint fired in the interim. The number
> of callbacks that I'm tracing also confirm this.
> 
> > 
> > On the other hand, if CPU 2 is offloaded, what you might be seeing is
> > the delayed drain of callbacks from the bypass.
> 
> Sorry should have clarified it was not offloaded.
> 
> I dug more deeper and noticed that during acceleration, it is possible that
> the gp_seq numbers of empty segments are updated. In this case,
> rcu_segcblist_accelerate() still returns true resulting in starting of a new
> future GP. The below patch cures it, but I'm not sure if it introduces other
> issues. In light testing, it appears working. WDYT?

You are quite correct that empty segments should not be assigned
grace-period numbers.  Or at least that any such grace-period numbers
should not be taken very seriously.

> ---8<-----------------------
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 5f4fd3b8777ca..ebdba1d95f629 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -446,7 +478,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>   */
>  bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  {
> -	int i;
> +	int i, oldest_seg;
>  
>  	WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
>  	if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
> @@ -465,6 +497,9 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  		    ULONG_CMP_LT(rsclp->gp_seq[i], seq))
>  			break;
>  
> +	/* The oldest segment after which everything later is merged. */
> +	oldest_seg = i;
> +
>  	/*
>  	 * If all the segments contain callbacks that correspond to
>  	 * earlier grace-period sequence numbers than "seq", leave.
> @@ -488,10 +523,19 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  	 * where there were no pending callbacks in the rcu_segcblist
>  	 * structure other than in the RCU_NEXT_TAIL segment.
>  	 */
>  	for (; i < RCU_NEXT_TAIL; i++) {
>  		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_NEXT_TAIL]);
>  		rsclp->gp_seq[i] = seq;
>  	}
> +
> +	/*
> +	 * If all segments after oldest_seg were empty, then new GP numbers
> +	 * were assigned to empty segments. In this case, no need to start
> +	 * those future GPs.
> +	 */
> +	if (rcu_segcblist_restempty(rsclp, oldest_seg))
> +		return false;
> +
>  	return true;
>  }

Looks like this needs a focused test program.  ;-)

Please see attached for an outdated test userspace program.  It probably
needs help to check for this additional case.  And probably also other
changes to account for three years of change.

This test program works by copying files from the designated Linux
source tree, then building them into a userspace torture test.

							Thanx, Paul

Attachment:
rcusegcblisttest.2020.06.17a.tgz

Description: rcusegcblisttest.2020.06.17a.tgz