On Fri, Sep 21, 2007 at 07:23:09PM -0400, Steven Rostedt wrote: > -- > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > If you do a synchronize_rcu() it might well have to wait through the > > following sequence of states: > > > > Stage 0: (might have to wait through part of this to get out of "next" queue) > > rcu_try_flip_idle_state, /* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state, /* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > Stage 1: > > rcu_try_flip_idle_state, /* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state, /* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > Stage 2: > > rcu_try_flip_idle_state, /* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state, /* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > Stage 3: > > rcu_try_flip_idle_state, /* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state, /* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > Stage 4: > > rcu_try_flip_idle_state, /* "I" */ > > rcu_try_flip_waitack_state, /* "A" */ > > rcu_try_flip_waitzero_state, /* "Z" */ > > rcu_try_flip_waitmb_state /* "M" */ > > > > So yes, grace periods do indeed have some latency. > > Yes they do. I'm now at the point that I'm just "trusting" you that you > understand that each of these stages are needed. My IQ level only lets me > understand next -> wait -> done, but not the extra 3 shifts in wait. > > ;-) In the spirit of full disclosure, I am not -absolutely- certain that they are needed, only that they are sufficient. Just color me paranoid. > > > True, but the "me" confused me. Since that task struct is not me ;-) > > > > Well, who is it, then? ;-) > > It's the app I watch sitting there waiting it's turn for it's callback to > run. :-) > > > > > Isn't the GP detection done via a tasklet/softirq. So wouldn't a > > > > > local_bh_disable be sufficient here? You already cover NMIs, which would > > > > > also handle normal interrupts. > > > > > > > > This is also my understanding, but I think this disable is an > > > > 'optimization' in that it avoids the regular IRQs from jumping through > > > > these hoops outlined below. > > > > > > But isn't disabling irqs slower than doing a local_bh_disable? So the > > > majority of times (where irqs will not happen) we have this overhead. > > > > The current code absolutely must exclude the scheduling-clock hardirq > > handler. > > ACKed, > The reasoning you gave in Peter's reply most certainly makes sense. > > > > > > > + * > > > > > > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU implementation > > > > > > > > > > Oh, come now! It's not "nuts" to use this ;-) > > > > > > > > > > > + * on a large SMP, they might want to use a hierarchical organization of > > > > > > + * the per-CPU-counter pairs. > > > > > > + */ > > > > > > > > Its the large SMP case that's nuts, and on that I have to agree with > > > > Paul, its not really large SMP friendly. > > > > > > Hmm, that could be true. But on large SMP systems, you usually have a > > > large amounts of memory, so hopefully a really long synchronize_rcu > > > would not be a problem. > > > > Somewhere in the range from 64 to a few hundred CPUs, the global lock > > protecting the try_flip state machine would start sucking air pretty > > badly. But the real problem is synchronize_sched(), which loops through > > all the CPUs -- this would likely cause problems at a few tens of > > CPUs, perhaps as early as 10-20. > > hehe, From someone who's largest box is 4 CPUs, to me 16 CPUS is large. > But I can see hundreds, let alone thousands of CPUs would make a huge > grinding halt on things like synchronize_sched. God, imaging if all CPUs > did that approximately at the same time. The system would should a huge > jitter. Well, the first time the SGI guys tried to boot a 1024-CPU Altix, I got an email complaining about RCU overheads. ;-) Manfred Spraul fixed things up for them, though. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html