On Thursday May 12, molle.bestefich@xxxxxxxxx wrote: > On 4/22/05, Molle Bestefich wrote: > > Just upgraded a MD RAID 5 box to 2.6.11 from 2.4.something. > > > > Found out one disk was failing completely, got a replacement from Maxtor. Neat. > > Replaced disk, rebooted.. > > Added the new disk to the array with 'raidhotadd'. > > MD started syncing. > > > > A couple of minutes into the process, it started *seriously* spamming > > the console with messages: > > > > ========================== > > Apr 22 01:47:00 linux kernel: ..<6>md: syncing RAID array md1 > > Apr 22 01:47:00 linux kernel: md: minimum _guaranteed_ reconstruction > > speed: 1000 KB/sec/disc. > > Apr 22 01:47:00 linux kernel: md: using maximum available idle IO bandwith (but > > not more than 200000 KB/sec) for reconstruction. > > Apr 22 01:47:00 linux kernel: md: using 128k window, over a total of > > 199141632 blocks. > > Apr 22 01:47:00 linux kernel: md: md1: sync done. > > Apr 22 01:47:00 linux kernel: ..<6>md: syncing RAID array md1 > > Apr 22 01:47:01 linux kernel: md: minimum _guaranteed_ reconstruction > > speed: 1000 KB/sec/disc. > > Apr 22 01:47:01 linux kernel: md: using maximum available idle IO bandwith (but > > not more than 200000 KB/sec) for reconstruction. > > Apr 22 01:47:01 linux kernel: md: using 128k window, over a total of > > 199141632 blocks. > > Apr 22 01:47:01 linux kernel: md: md1: sync done. > > ========================== > > [snip] > > > afterwards, I can see that the above messages repeat themselves. > > cat /var/log/messages | grep md | grep 'Apr 22 01:47:01' | grep 'sync done' > > tells me that the messages were repeated 12 times per second. The > > Ping!... > Neil, just wondering, any comments regarding this particular endless loop in MD? > (Anything I can test or some such?) Thanks for the ping, things sometimes get lost in the noise.... This sounds a bit like the problem that is addressed by md-make-raid5-and-raid6-robust-against-failure-during-recovery.patch in the current -mm patches (look in the brokenout directory). This would only happen if you have multiple failed devices. So maybe while the rebuild was happening, another device failed (which seems to happen more and more as device sizes are increasing and reliability is going the other way). Could this (another drive failure) be the case? NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html