On Sat, Jul 06, 2019 at 07:03:31PM -0400, Joel Fernandes wrote: > On Sat, Jul 06, 2019 at 11:21:17AM -0700, Paul E. McKenney wrote: > > On Sat, Jul 06, 2019 at 08:02:30AM -0400, Joel Fernandes wrote: > > > On Thu, Jul 04, 2019 at 03:17:02PM -0700, Paul E. McKenney wrote: > > > > On Thu, Jul 04, 2019 at 02:50:55PM -0400, Joel Fernandes wrote: > > > [snip] > > > > > I did have a request, could you help me understand why is the grace period > > > > > duration double that of my busy wait time? You mentioned this has something > > > > > to do with the thread not waking up before another GP is started. But I did > > > > > not follow this. Thanks a lot!! > > > > > > > > Let's look at a normal grace period, and assume the ->gp_seq grace-period > > > > sequence counter is initially 0x1, so that a grace period is already > > > > in progress (keep in mind that the low-order two bits of ->gp_seq are > > > > the phase within the grace period and the rest of the bits are the > > > > grace-period number, so we are in phase 1 of the grace period following > > > > grace period #0). This grace period was started via synchronize_rcu() > > > > by Task A. > > > > > > > > Then we have the following sequence of events: > > > > > > > > o Task B does call_rcu(), but is too late to use the already-started > > > > grace period, so it needs another one. The ->gp_seq_needed > > > > counter thus becomes 0x8. > > > > > > > > o The current grace period completes, so that the ->gp_seq > > > > counter becomes 0x4. Callback invocation commences. > > > > > > > > o The grace-period kthread notices that ->gp_seq_needed is greater > > > > than ->gp_seq, so it starts a new grace period. The ->gp_seq > > > > counter therefore becomes 0x5. > > > > > > > > o The callback for Task A's synchronize_rcu() is invoked, awakening > > > > Task A. This happens almost immediately after the new grace > > > > period starts, but does definitely happen -after- that grace > > > > period starts. > > > > > > Yes, but at this point, we are still at the 1GP mark. But the distribution's > > > median below shows a strong correlation with 2 preempt-disable durations and > > > grace-period completion. For example, if the duration of the preempt > > > disable/enable loop is 50ms, it strongly shows the writer's median time > > > difference before and after the synchronize_rcu as 100ms, not really showing > > > it is 60 ms or 75 ms or anything. > > > > > > > > > > > o Task A, being part of rcuperf, almost immediately does another > > > > synchronize_rcu(). So ->gp_seq_needed becomes 0xc. > > > > > > Yes, but before that another synchronize_rcu, it also gets the timestamp and > > > does a diff between old/new timestamp and has captured the data, so at that > > > point the data captured should only be for around 1GPs worth give or take. > > > > But in this case, the "give or take" is almost a full grace period, for > > a total of almost two grace periods. > > > > > > If you play out the rest of this sequence, you should see how Task A > > > > waits for almost two full grace periods. > > > > > > I tried to play this out, still didn't get it :-|. Not that it is an issue > > > per-se, but still would like to understand it better. > > > > Let's try it a different way. Let's skip the state portion of ->gp_seq > > and just focus on the number of elapsed grace periods. > > > > o Let's start with RCU idle, having completed grace-period number > > zero and not yet having started grace-period number 1. > > Let's also abbreviate: GP0 and GP1. > > > > o Task A wants a grace period. Because RCU is idle, Task A's > > GP will complete at completion of GP1. > > > > o RCU starts GP1. > > > > o Almost immediately thereafter, Task B wants a grace period. > > But it cannot use GP1, because GP1 has already started, even > > if just barely, because some CPU or tasks might already have > > reported a quiescent state for GP1. > > > > So Task B's GP will not complete until the end of GP2. And that > > is almost two GPs worth of time from now. > > > > And on a system where RCU is busy, Task B's experience is the common > > case if it is in a tight loop doing synchronize_rcu(). Because RCU is > > busy, by the end of each grace period, there will always be a request > > for another grace period. And therefore RCU's grace-period kthread > > will immediately start a new grace period upon completion of each > > grace period. So the sequence of events from Task B's viewpoint will > > be as follows: > > > > o Task B executes synchronize_rcu(), which requests a new grace > > period. > > > > o This requested grace period ends, and another immediately starts. > > > > o Task B's RCU callback is invoked, which does a wakeup. > > > > o Task B executes another synchronize_rcu(), but just after a new > > grace period has started. Task B thus has to wait almost two > > full grace periods. > > > > This could be perceived as a problem, but my question back to you would > > be "Why on earth are you invoking synchronize_rcu() in a tight loop???" > > "Oh, because you are running rcuperf? Well, that is why rcuperf uses > > ftrace data!" > > > > Which might be the point on which we are talking past each other. You > > might have been thinking of the latency of RCU's grace-period computation. > > I was thinking of the length of time between the call to synchronize_rcu() > > and the corresponding return. > > This registered perfectly with me now, I was having a hard time visualizing > but it makes sense to me now based on your description. Thanks a lot! > > Ascii art: > (Basically the synch rcu is done just after a slight offset) > > synchronize_rcu Wait synchronize_rcu Wait > |<-------------------->| |--------------------| > v v v v > <----------> <----------> <----------> <---------> <---------> > GP GP GP GP GP > > GP = long preempt disable duration You got it! Thanx, Paul