On Wed, Dec 17, 2008 at 10:50 PM, Jon Nelson <jnelson-linux-raid@xxxxxxxxxxx> wrote: > On Wed, Dec 17, 2008 at 10:42 PM, Neil Brown <neilb@xxxxxxx> wrote: >> On Tuesday December 16, neilb@xxxxxxx wrote: >>> On Monday December 15, jnelson-linux-raid@xxxxxxxxxxx wrote: >>> > On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@xxxxxxx> wrote: >>> > > On Monday December 15, jnelson-linux-raid@xxxxxxxxxxx wrote: >>> > >> >>> > >> Aha! This explains a question I raised in another email. What >>> > >> happened there is a previously fully active member of the raid got >>> > >> added, somehow, as a spare, via --incremental. That's when the entire >>> > >> raid thought it needed to be rebuilt. How did that (the device being >>> > >> treated as a spare instead of as a previously fully active member) >>> > >> happen? >>> > > >>> > > It is hard to guess without details, and they might be hard to collect >>> > > after the fact. >>> > > Maybe if you have the kernel logs of when the server rebooted and the >>> > > recovery started, that might contain some hints. >>> > >>> > I hope this helps. >>> >>> Yes it does, though I generally prefer to get more complete logs. If >>> I get the surrounding log lines then I know what isn't there as well >>> as what is - and it isn't always clear at first which bits will be >>> important. >>> >>> The problem here is that --incremental doesn't provide the --re-add >>> functionality that you are depending on. That was an oversight on my >>> part. I'll see if I can get it fixed. >>> In the mean time, you'll need to use --re-add (or --add, it does the >>> same thing in your situation) to add nbd0 to the array. >> >> Actually, I'm wrong. >> --incremental does do the right thing w.r.t. --re-add. >> I couldn't reproduce your symptoms. > > OK. > >> It could be that you are hitting the bug fixed by >> commit a0da84f35b25875870270d16b6eccda4884d61a7 > > That sure sounds like it. I'd have to log to see what happened, > exactly, but I've added substantial logging around the device > discovery and addition section which manages this particular raid. > >> You would need 2.6.26 or later to have that fixed. >> Can you try with a newer kernel??? > > I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X > afaik. I suspect I can also backport that patch to 2.6.25 easily. The kernel source for 2.6.25.18-0.2 (from suse) has this patch already, so I was already using it. Perhaps this weekend or some night this week I'll find time to try to break things again. -- Jon -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html