Re: how to proceed with possible corruption

Robin Hill <robin@xxxxxxxxxxxxxxx> · Wed, 19 Dec 2012 22:30:09 +0000

On Wed Dec 19, 2012 at 01:06:35PM -0800, Ross Boylan wrote:

> Short version: I suspect some of my array components may be corrupt, and
> wonder what the best way is to  check for it and fix it.
> 
> I have a VM  configued similarly to my real machine for testing.
> When I brought it up there were some complaints about the arrays and
> needing to resync them.  While the sync appeared to complete
> successfully, the VM was quite unstable afterwards, as it had not been
> before.
> 
> The VM was most likely shut down abruptly when the real machine had a
> power failure.  Also, I had added a 3rd disk to my RAID-1 arrays since I
> last booted the VM, but my mdadm.conf had -num-devices=2 (which we have
> already established is a recipe for trouble).
> 
> I installed a new kernel in the VM, and have not had problems since.  So
> I wonder if some of the kernel files got corrupted, and more generally
> if the virtual disks are trustworthy.
> 
> Does this /proc/mdstat offer any clues about which disks might be a
> problem?
> 
> Personalities : [raid1]
> md0 : active raid1 sda1[0] sdc1[2] sdb1[1]
>       96256 blocks [3/3] [UUU]
> 
> md1 : active raid1 sdb3[0] sdc3[2] sda3[1]
>       8187712 blocks [3/3] [UUU]
> 
> unused devices: <none>
> 
> It seems odd the disks are out of order, that is not sda, sdb, sdc.
> 
> I know I could fail some components and add them back to assure
> consistency, but this wouldn't tell me if they were inconsistent before
> that.  There's also the possibility they are  consisstent but corrupt.
> 
The arrays are RAID1, so the order of the disks is irrelevant - they
data should be identical on all disks.

You can check whether the disks are all in sync by doing:
    echo check > /sys/block/mdX/md/sync_action

Once the check is complete (you can see the progress via /proc/mdstat)
then /sys/block/mdX/md/mismatch_cnt will indicate whether or not there
are any mismatches. If so, use "repair" instead of "check" in the above
command to resync the drives.

Otherwise, the issue could be filesystem corruption. A "fsck -f" on the
array should detect that.

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |
Attachment:
pgped9kOJuYXj.pgp

Description: PGP signature