On Fri, Sep 21, 2007 at 10:56:56PM -0400, Steven Rostedt wrote: > > [ sneaks away from the family for a bit to answer emails ] [ same here, now that you mention it... ] > -- > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > On Fri, Sep 21, 2007 at 09:19:22PM -0400, Steven Rostedt wrote: > > > > > > -- > > > On Fri, 21 Sep 2007, Paul E. McKenney wrote: > > > > > > > > > > In any case, I will be looking at the scenarios more carefully. If > > > > > it turns out that GP_STAGES can indeed be cranked down a bit, well, > > > > > that is an easy change! I just fired off a POWER run with GP_STAGES > > > > > set to 3, will let you know how it goes. > > > > > > > > The first attempt blew up during boot badly enough that ABAT was unable > > > > to recover the machine (sorry, grahal!!!). Just for grins, I am trying > > > > it again on a machine that ABAT has had a better record of reviving... > > > > > > This still frightens the hell out of me. Going through 15 states and > > > failing. Seems the CPU is holding off writes for a long long time. That > > > means we flipped the counter 4 times, and that still wasn't good enough? > > > > Might be that the other machine has its 2.6.22 version of .config messed > > up. I will try booting it on a stock 2.6.22 kernel when it comes back > > to life -- not sure I ever did that before. Besides, the other similar > > machine seems to have gone down for the count, but without me torturing > > it... > > > > Also, keep in mind that various stages can "record" a memory misordering, > > for example, by incrementing the wrong counter. > > > > > Maybe I'll boot up my powerbook to see if it has the same issues. > > > > > > Well, I'm still finishing up on moving into my new house, so I wont be > > > available this weekend. > > > > The other machine not only booted, but has survived several minutes of > > rcutorture thus far. I am also trying POWER5 machine as well, as the > > one currently running is a POWER4, which is a bit less aggressive about > > memory reordering than is the POWER5. > > > > Even if they pass, I refuse to reduce GP_STAGES until proven safe. > > Trust me, you -don't- want to be unwittingly making use of a subtely > > busted RCU implementation!!! > > I totally agree. This is the same reason I want to understand -why- it > fails with 3 stages. To make sure that adding a 4th stage really does fix > it, and doesn't just make the chances for the bug smaller. Or if it really does break, as opposed to my having happened upon a sick or misconfigured machine. > I just have that funny feeling that we are band-aiding this for POWER with > extra stages and not really solving the bug. > > I could be totally out in left field on this. But the more people have a > good understanding of what is happening (this includes why things fail) > the more people in general can trust this code. Right now I'm thinking > you may be the only one that understands this code enough to trust it. I'm > just wanting you to help people like me to trust the code by understanding > and not just having faith in others. Agreed. Trusting me is grossly insufficient. For one thing, the Linux kernel has a reasonable chance of outliving me. > If you ever decide to give up jogging, we need to make sure that there are > people here that can still fill those running shoes (-: Well, I certainly don't jog as fast or as far as I used to! ;-) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html