Re: Normal RCU grace period can be stalled for long because need-resched flags not set?

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Sat, 6 Jul 2019 08:02:30 -0400

On Thu, Jul 04, 2019 at 03:17:02PM -0700, Paul E. McKenney wrote:
> On Thu, Jul 04, 2019 at 02:50:55PM -0400, Joel Fernandes wrote:
[snip] 
> > I did have a request, could you help me understand why is the grace period
> > duration double that of my busy wait time? You mentioned this has something
> > to do with the thread not waking up before another GP is started. But I did
> > not follow this. Thanks a lot!!
> 
> Let's look at a normal grace period, and assume the ->gp_seq grace-period
> sequence counter is initially 0x1, so that a grace period is already
> in progress (keep in mind that the low-order two bits of ->gp_seq are
> the phase within the grace period and the rest of the bits are the
> grace-period number, so we are in phase 1 of the grace period following
> grace period #0).  This grace period was started via synchronize_rcu()
> by Task A.
> 
> Then we have the following sequence of events:
> 
> o	Task B does call_rcu(), but is too late to use the already-started
> 	grace period, so it needs another one.	The ->gp_seq_needed
> 	counter thus becomes 0x8.
> 
> o	The current grace period completes, so that the ->gp_seq
> 	counter becomes 0x4.  Callback invocation commences.
> 
> o	The grace-period kthread notices that ->gp_seq_needed is greater
> 	than ->gp_seq, so it starts a new grace period.  The ->gp_seq
> 	counter therefore becomes 0x5.
> 
> o	The callback for Task A's synchronize_rcu() is invoked, awakening
> 	Task A.  This happens almost immediately after the new grace
> 	period starts, but does definitely happen -after- that grace
> 	period starts.

Yes, but at this point, we are still at the 1GP mark. But the distribution's
median below shows a strong correlation with 2 preempt-disable durations and
grace-period completion. For example, if the duration of the preempt
disable/enable loop is 50ms, it strongly shows the writer's median time
difference before and after the synchronize_rcu as 100ms, not really showing
it is 60 ms or 75 ms or anything.

> 
> o	Task A, being part of rcuperf, almost immediately does another
> 	synchronize_rcu().  So ->gp_seq_needed becomes 0xc.

Yes, but before that another synchronize_rcu, it also gets the timestamp and
does a diff between old/new timestamp and has captured the data, so at that
point the data captured should only be for around 1GPs worth give or take.

> If you play out the rest of this sequence, you should see how Task A
> waits for almost two full grace periods.

I tried to play this out, still didn't get it :-|. Not that it is an issue
per-se, but still would like to understand it better.

> > Performance changes in consolidated vs regular
> > -------------------------------------------
> > I ran a thread on a reserved CPU doing preempt disable + busy wait + preempt enable
> > in a loop and measured the difference in rcuperf between conslidated and regular.
> > nreaders = nwriters = 10.
> > 
> > 		(preempt disable duration)
> > 		5ms	10ms	20ms	50ms
> > v4.19
> > median (usecs)	12000.3	12001	11000	12000
> > 
> > v5.1 (deferred)
> > median (usecs)	13000	19999   40000   100000
> > 
> > All of this is still within spec of RCU.

thanks!

 - Joel