> -----Original Message----- > From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid- > owner@xxxxxxxxxxxxxxx] On Behalf Of Martin K. Petersen > Sent: Tuesday, October 14, 2008 5:12 AM > To: Keld Jørn Simonsen > Cc: Billy Crook; Justin Piszcz; Bill Davidsen; Neil Brown; Linux RAID > Subject: Re: Distributed spares > > >>>>> "Keld" == Keld Jørn Simonsen <keld@xxxxxxxx> writes: > > Keld> I have also been thinking a little on this. My idea is that if > Keld> bit errors develop on disks, then there is first maybe one bit > Keld> error, and the crc check on the disk sectors then finds and > Keld> corrects these. > > Keld> If you rewrite such bit errors, then that bit error will be > Keld> corrected, and you prevent the one-bit error from developing to > Keld> a two-bit error that is not correctable by the CRC. > > I think you are assuming that disks are much simpler than they > actually are. > > A modern disk drive protects a 512-byte sector with a pretty strong > ECC that's capable of correcting errors up to ~50 bytes. Yes, that's > bytes. > > Also, many drive firmwares will internally keep track of problematic > media areas and rewrite or reallocate affected blocks. That includes > stuff like rewriting sectors that are susceptible to bleed due to > being adjacent to write hot spots. > > -- > Martin K. Petersen Oracle Linux Engineering > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" > in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html Martin is absolutely correct. Enterprise class drives have come a long way. They will scan and fix blocks (but certainly not 100% of them) in background. The $99 disk drives you get at the local computer retailer now even have limited BGMS / repair capability. If you run the built-in diags on disk drives, you can be presented with a list of known bad blocks, or when you boot a disk drive, sometimes you can get a bad block display in POST. How about a baby step? When you run offline or online tests, or even when you run media scans, you get a list of known defects. How about a program that rewrites a RAID1/3/5/6 stripe, and you just pass it the physical device name and known block number? As for checking out a disk .. The prior poster's idea about putting the RAID in degraded mode for purposes of checking out a disk is, Frankly, nuts. NEVER degrade anything. Just use the hotspare and do a hot clone of the disk in question to the hotspare, then make that disk the new hot spare and repeat.. Equate this to a "Rotating the Tires" mode. David @ santools com http://www.santools.com/smart/unix/manual -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html