On Sat, Sep 22, 2012 at 10:25:59PM +0000, Paul Walmsley wrote: > On Sat, 22 Sep 2012, Paul E. McKenney wrote: > > > And here is a patch. I am still having trouble reproducing the problem, > > but figured that I should avoid serializing things. > > Thanks, testing this now on v3.6-rc6. Very cool, thank you! > One question though about the patch > description: > > > All this begs the question of exactly how a callback-free grace period > > gets started in the first place. This can happen due to the fact that > > CPUs do not necessarily agree on which grace period is in progress. > > If a CPU still believes that the grace period that just completed is > > still ongoing, it will believe that it has callbacks that need to wait > > for another grace period, never mind the fact that the grace period > > that they were waiting for just completed. This CPU can therefore > > erroneously decide to start a new grace period. > > Doesn't this imply that this bug would only affect multi-CPU systems? Surprisingly not, at least when running TREE_RCU or TREE_PREEMPT_RCU. In order to keep lock contention down to a dull roar on larger systems, TREE_RCU keeps three sets of books: (1) the global state in the rcu_state structure, (2) the combining-tree per-node state in the rcu_node structure, and the per-CPU state in the rcu_data structure. A CPU is not officially aware of the end of a grace period until it is reflected in its rcu_data structure. This has the perhaps-surprising consequence that the CPU that detected the end of the old grace period might start a new one before becoming officially aware that the old one ended. Why not have the CPU inform itself immediately upon noticing that the old grace period ended? Deadlock. The rcu_node locks must be acquired from leaf towards root, and the CPU is holding the root rcu_node lock when it notices that the grace period has ended. I have made this a bit less problematic in the bigrt branch, working towards a goal of getting RCU into a state where automatic formal validation might one day be possible. And yes, I am starting to get some formal-validation people interested in this lofty goal, see for example: http://sites.google.com/site/popl13grace/paper.pdf. > The recent tests here have been on Pandaboard, which is dual-CPU, but my > recollection is that I also observed the warnings on a single-core > Beagleboard. Will re-test. Anxiously awaiting the results. This has been a strange one, even by RCU's standards. Plus I need to add a few Reported-by lines. Next version... Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html