On Mon, Oct 13, 2008 at 05:30:49PM -0500, Billy Crook wrote: > Just my two cents.... Those daily smart tests or regularly running > badblocks are fine, but they're not 'real' load. A test can't prove > everything is right, it can at best only prove it didn't find anything > wrong. Distributed spare would exert 'real' load on the spare because > the spare disks ARE the live disks. > > > On a side note, it would be handy to have a daemon that could run in > the background on large raid1's, or raid6', and once a month, pull > each disk out of the array sequentially, completely overwrite it, > check it with badblocks several times, do the smart tests, etc..., > then rejoin it, reinstall grub, wait an hour and move on. The point > being, of course, to kill weak drives off early and in a controlled > manor. It would be even nicer if there were a way to hot-transfer one > raid component to another without setting anything faulty. I suppose > you could make all the components of the real array be single disk > raid1 arrays for that purpose. Then you could have one extra disk set > aside for this sort of scrubbing, and never even be down one of your > parities. I guess I should add that onto my todo list.... I have also been thinking a little on this. My idea is that if bit errors develop on disks, then there is first maybe one bit error, and the crc check on the disk sectors then finds and corrects these. If you rewrite such bit errors, then that bit error will be corrected, and you prevent the one-bit error from developing to a two-bit error that is not correctable by the CRC. Is there some merit to this idea? Furthermore, if bad luck has striken, then in the case of mirrored RAIDs you could - when crc fails, then see that this is the block in error and recreate it from the redundant info, Would be good for raid1, raid10, raid5, raid6. If the block then could not be written without errors, then it could be added to a bad blocks list and remapped. I think there is nothing novel in a scheme like this, but I would like to know if it is implemented somewhere. Articles say that bit errors on disks are becoming more and more frequent, so schemes like this may help the scary scenarion somewhat. best regards keld -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html