Re: Redundancy check using "echo check > sync_action": error reporting?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu Mar 20, 2008 at 03:19:08PM +0100, Bas van Schaik wrote:

> Robin Hill wrote:
> > On Thu Mar 20, 2008 at 02:32:37PM +0100, Bas van Schaik wrote:
> >   
> >> Anyone able to answer the last and most important question: does it
> >> produce a message during resync in case of corruption? That would be great!
> >>     
> > There's no explicit message produced by the md module, no.  You need to
> > check the /sys/block/md{X}/md/mismatch_cnt entry to find out how many
> > mismatches there are.  Similarly, following a repair this will indicate
> > how many mismatches it thinks have been fixed (by updating the parity
> > block to match the data blocks).
> >   
> Marvellous! I naively assumed that the module would warn me, but that's
> not true. Wouldn't it be appropriate to print a message to dmesg if such
> a mismatch occurs during a check? Such a mismatch clearly means that
> there is something wrong with your hardware lying beneath md, doesn't it?
> 
With a RAID5 then mostly, yes - there may be errors caused by transient
situations (interference, cosmic rays, etc) which are entirely
independent of the hardware.  With other RAID versions it's not quite as
clear cut.  For example with RAID1 it's possible for the in-memory data
to have been changed between writing to each disk (especially with swap
disks) - this isn't necessarily an issue (and certainly not a hardware
one).

> > I've no idea whether the checkarray script you're using is checking this
> > counter - there seems little point in having a special script if it
> > isn't though.
> >   
> If I understand the meaning of this counter, it would be sufficient to
> check the value of it _before_ the check operation and compare that
> value to the counter value _after_ the check. If the counter has
> increased: the check has encountered some inconsistencies which should
> be reported.
> Please correct me if I'm wrong!
> 
Depends on what the previous operation was.  After a repair, the counter
will indicate the number of errors fixed, not the number remaining.
Theoretically, after a repair there will be no errors remaining, so any
value (> 0) in the counter after a check would indicate an issue to be
reported.

Cheers,
        Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

Attachment: pgpG9ZoO94pXT.pgp
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux