Re: Redundancy check using "echo check > sync_action": error reporting?

Andre Noll <maan@xxxxxxxxxxxxxxx> · Thu, 20 Mar 2008 19:57:05 +0100

On 14:02, Theodore Tso wrote:
> On Thu, Mar 20, 2008 at 06:39:06PM +0100, Andre Noll wrote:
> > On 12:35, Theodore Tso wrote:
> > 
> > > If a mismatch is detected in a RAID-6 configuration, it should be
> > > possible to figure out what should be fixed
> > 
> > It can be figured out under the assumption that exactly one drive has
> > bad data and all other ones have good data. But that seems to be an
> > assumption that is hard to verify in reality.
> 
> True, but it's what ECC memory does.  :-)   And most people agree that
> it's a useful thing to do with memory.  
> 
> If you do ECC syndrome checking on every read, and follow that up with
> periodic scrubbing so that you catch (and correct) errors quickly, it
> is a reasonable assumption to make.
> 
> Obviously a warning should be given when you do this kind of ECC
> fixups, and if there is an increasing number of ECC fixups that are
> being done, that should set off alarms that maybe there is a hardware
> problem that needs to be addressed.

I agree, but not everybody likes the idea to do this kind of error
correction also for hard disks in raid6 [1]. In case of a hard power
failure it may well happen that any given subset of the disks in
the array is up to date and all others are not. So in practice the
situation for hard disks is different from memory modules.

OTOH, it's probably the best thing one can do, so I'd vote for
implementing this feature.

Andre

[1] http://www.mail-archive.com/linux-raid@xxxxxxxxxxxxxxx/msg09863.html
-- 
The only person who always got his work done by Friday was Robinson Crusoe
Attachment:
signature.asc

Description: Digital signature