On Sat, Jul 06, 2019 at 08:02:30AM -0400, Joel Fernandes wrote: > On Thu, Jul 04, 2019 at 03:17:02PM -0700, Paul E. McKenney wrote: > > On Thu, Jul 04, 2019 at 02:50:55PM -0400, Joel Fernandes wrote: > [snip] > > > I did have a request, could you help me understand why is the grace period > > > duration double that of my busy wait time? You mentioned this has something > > > to do with the thread not waking up before another GP is started. But I did > > > not follow this. Thanks a lot!! > > > > Let's look at a normal grace period, and assume the ->gp_seq grace-period > > sequence counter is initially 0x1, so that a grace period is already > > in progress (keep in mind that the low-order two bits of ->gp_seq are > > the phase within the grace period and the rest of the bits are the > > grace-period number, so we are in phase 1 of the grace period following > > grace period #0). This grace period was started via synchronize_rcu() > > by Task A. > > > > Then we have the following sequence of events: > > > > o Task B does call_rcu(), but is too late to use the already-started > > grace period, so it needs another one. The ->gp_seq_needed > > counter thus becomes 0x8. > > > > o The current grace period completes, so that the ->gp_seq > > counter becomes 0x4. Callback invocation commences. > > > > o The grace-period kthread notices that ->gp_seq_needed is greater > > than ->gp_seq, so it starts a new grace period. The ->gp_seq > > counter therefore becomes 0x5. > > > > o The callback for Task A's synchronize_rcu() is invoked, awakening > > Task A. This happens almost immediately after the new grace > > period starts, but does definitely happen -after- that grace > > period starts. > > Yes, but at this point, we are still at the 1GP mark. But the distribution's > median below shows a strong correlation with 2 preempt-disable durations and > grace-period completion. For example, if the duration of the preempt > disable/enable loop is 50ms, it strongly shows the writer's median time > difference before and after the synchronize_rcu as 100ms, not really showing > it is 60 ms or 75 ms or anything. > > > > > o Task A, being part of rcuperf, almost immediately does another > > synchronize_rcu(). So ->gp_seq_needed becomes 0xc. > > Yes, but before that another synchronize_rcu, it also gets the timestamp and > does a diff between old/new timestamp and has captured the data, so at that > point the data captured should only be for around 1GPs worth give or take. But in this case, the "give or take" is almost a full grace period, for a total of almost two grace periods. > > If you play out the rest of this sequence, you should see how Task A > > waits for almost two full grace periods. > > I tried to play this out, still didn't get it :-|. Not that it is an issue > per-se, but still would like to understand it better. Let's try it a different way. Let's skip the state portion of ->gp_seq and just focus on the number of elapsed grace periods. o Let's start with RCU idle, having completed grace-period number zero and not yet having started grace-period number 1. Let's also abbreviate: GP0 and GP1. o Task A wants a grace period. Because RCU is idle, Task A's GP will complete at completion of GP1. o RCU starts GP1. o Almost immediately thereafter, Task B wants a grace period. But it cannot use GP1, because GP1 has already started, even if just barely, because some CPU or tasks might already have reported a quiescent state for GP1. So Task B's GP will not complete until the end of GP2. And that is almost two GPs worth of time from now. And on a system where RCU is busy, Task B's experience is the common case if it is in a tight loop doing synchronize_rcu(). Because RCU is busy, by the end of each grace period, there will always be a request for another grace period. And therefore RCU's grace-period kthread will immediately start a new grace period upon completion of each grace period. So the sequence of events from Task B's viewpoint will be as follows: o Task B executes synchronize_rcu(), which requests a new grace period. o This requested grace period ends, and another immediately starts. o Task B's RCU callback is invoked, which does a wakeup. o Task B executes another synchronize_rcu(), but just after a new grace period has started. Task B thus has to wait almost two full grace periods. This could be perceived as a problem, but my question back to you would be "Why on earth are you invoking synchronize_rcu() in a tight loop???" "Oh, because you are running rcuperf? Well, that is why rcuperf uses ftrace data!" Which might be the point on which we are talking past each other. You might have been thinking of the latency of RCU's grace-period computation. I was thinking of the length of time between the call to synchronize_rcu() and the corresponding return. Does that help, or am I missing your point? Thanx, Paul > > > Performance changes in consolidated vs regular > > > ------------------------------------------- > > > I ran a thread on a reserved CPU doing preempt disable + busy wait + preempt enable > > > in a loop and measured the difference in rcuperf between conslidated and regular. > > > nreaders = nwriters = 10. > > > > > > (preempt disable duration) > > > 5ms 10ms 20ms 50ms > > > v4.19 > > > median (usecs) 12000.3 12001 11000 12000 > > > > > > v5.1 (deferred) > > > median (usecs) 13000 19999 40000 100000 > > > > > > All of this is still within spec of RCU. > > thanks! > > - Joel >