Re: Redundancy check using "echo check > sync_action": error reporting?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Theodore Tso wrote:
On Thu, Mar 20, 2008 at 06:39:06PM +0100, Andre Noll wrote:
On 12:35, Theodore Tso wrote:

If a mismatch is detected in a RAID-6 configuration, it should be
possible to figure out what should be fixed
It can be figured out under the assumption that exactly one drive has
bad data and all other ones have good data. But that seems to be an
assumption that is hard to verify in reality.

True, but it's what ECC memory does.  :-)   And most people agree that
it's a useful thing to do with memory.
If you do ECC syndrome checking on every read, and follow that up with
periodic scrubbing so that you catch (and correct) errors quickly, it
is a reasonable assumption to make.

Obviously a warning should be given when you do this kind of ECC
fixups, and if there is an increasing number of ECC fixups that are
being done, that should set off alarms that maybe there is a hardware
problem that needs to be addressed.

Regards,

						- Ted

This might have been stated before in the thread, but most of the raid rebuilds are triggered by easily identified drive failures (i.e., a completely dead drive or a sequence of bad sectors that generate an IO error as we read from the platter). Fortunately, these are also the most common failures in RAID boxes ;-)

The way you deal with class of errors that don't trigger obvious failures is to do some kind of background scrubbing or add extra protection data to the disk.

Martin Petersen presented the new "DIF" work at the FS/IO workshop. This might be an interesting feature to build into MD raid devices:

http://oss.oracle.com/projects/data-integrity/documentation/

You would need to reformat your drives, so this is not a generic solution for all users, but it really does address the core of the issue.

ric
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux