Re: mdadm ignoring X as it reports Y as failed

NeilBrown <neilb@xxxxxxx> · Mon, 8 Jul 2013 13:58:58 +1000

On Sat, 06 Jul 2013 21:06:14 +0200 "Marek Jaros" <mjaros1@xxxxxxx> wrote:

> Hey everybody.
> 
> To keep it short, I have a RAID-5 mdraid, just today 2 out of the 5  
> drives dropped out. It was a cable issue and has since been fixed. The  
> array was not being written to or utilized in other way so no data has  
> been lost.
> 
> However when I attempted to reassemble the array with
> 
> mdadm --assemble --force --verbose /dev/md0 /dev/sdc /dev/sdd /dev/sde  
> /dev/sdf /dev/sdg
> 
> 
> I got the folowing errors
> 
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sdf is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdg is identified as a member of /dev/md0, slot 4.
> mdadm: ignoring /dev/sde as it reports /dev/sdc as failed
> mdadm: ignoring /dev/sdf as it reports /dev/sdc as failed
> mdadm: ignoring /dev/sdg as it reports /dev/sdc as failed
> mdadm: added /dev/sdd to /dev/md0 as 1
> mdadm: no uptodate device for slot 2 of /dev/md0
> mdadm: no uptodate device for slot 3 of /dev/md0
> mdadm: no uptodate device for slot 4 of /dev/md0
> mdadm: added /dev/sdc to /dev/md0 as 0
> mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
> 
> 
> After doing --examine* I indeed found out that the StateArray info  
> inside the superblock has marked the first two drives as missing. That  
> is however not true anymore but I can't force it to assemble the array  
> or update the superblock info.
> 
> So is there anyway to force mdadm to assemble the array? Or perhaps  
> edit the superblock info manually? I'd rather avoid having to recreate  
> the array from scratch.
> 
> Any help or pointers with more info are highly appreciated. Thank you.
> 

Hi again,
 could you tell me what kernel you are running?  Because as far as I can tell
 the state of the devices that you reported is impossible!

The interesting bit of the --examine output is:

/dev/sdc:
     Update Time : Sat Jul  6 14:43:14 2013
          Events : 2742
    Array State : AAAAA ('A' == active, '.' == missing)
/dev/sdd:
     Update Time : Sat Jul  6 14:29:42 2013
          Events : 2742
    Array State : AAAAA ('A' == active, '.' == missing)
/dev/sde:
     Update Time : Sat Jul  6 14:46:15 2013
          Events : 2742
    Array State : ..AAA ('A' == active, '.' == missing)
/dev/sdg:
     Update Time : Sat Jul  6 14:46:15 2013
          Events : 2742
    Array State : ..AAA ('A' == active, '.' == missing)

 From this I can see that:
   at 14:29:42 everything was fine and all the superblocks were updated.
   at 14:43:13 everything still seemed to be fine and md tried to update the
               superblock again (it does that from time to time) but failed
               to write to /dev/sdd.  This would have triggered an error so
               it would have marked sdd as faulty and updated the superblocks
               again.
Probably when it tried it found that the write to sdc failed to, so it marked
that as faulty and tried again.
   at 14:46:15 it wrote out metadata to sde and sdg reporting that sdc and sdd
                were faulty.

Every time that it updates the superblock when the array is degraded it must
update the 'Events' count.  However the Events count at 14:46:15 (after 2
devices have failed) is the same as it was at 14:43:14 before anything had
failed.  That is really wrong.

Hence the question.  I need to know if this is a bug that has already been
fixed (I cannot find a fix, but you never know), or if the bug is still
present and I need to hunt some more.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature