Re: Problem diagnosing rebuilding raid5 array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 14 Oct 2013 12:31:04 -0400 peter@xxxxxxxxxxxx wrote:

> Hi!
> 
> I'm having some problems with a raid 5 array and I'm not sure how to  
> diagnose the problem and how to proceed so I figured I need to ask the  
> experts :-)
> 
> I actually suspect I may have several problems at the same time.
> 
> The machine has two raid arrays, one raid 1 (md0) and one raid 5  
> (md1). The raid 5 array consists of 5 x 2TB WD RE4-GP drives.
> 
> I found some read errors in the log on /dev/sdh so I replaced it with  
> a new RE4 GP drive and did mdadm --add /dev/md1 /dev/sdh.
> 
> The array was rebuilding and I left it for the night.
> 
> In the morning cat /proc/mdstat showed that 2 drives where down. I may  
> remember incorrectly but I think that /dev/sdh showed up as a spare  
> and another drive showed fail but the array showed up as active.
> 
> Anyway, I'm not sure which drive showed fail but I disconnected the  
> system for more diagnosis. This was a couple of days ago.
> 
> I found that the CPU fan had stopped working and replaced it. The case  
> have several fans and the heatsink seemed cool even without the fan  
> (it's an i3-530 that does nothing more than samba so it's mostly  
> idle). Possibly the hardrives has been running hotter than normal for  
> a while though.
> 
> Anyway, now when I reboot I get this:
> 
> > cat /proc/mdstat
> Personalities : [raid1]
> md1 : inactive sdd[1](S) sdh[5](S) sdg[4](S) sdf[2](S) sde[0](S)
>        9767572480 blocks
> 
> md0 : active raid1 sda[0] sdb[1]
>        1953514496 blocks [2/2] [UU]
> 
> unused devices: <none>
> 
> 
> I'm not sure what is happening and what my next step is. I would  
> appreciate any help on this so I don't screw up the system more than  
> it already is :-)

We have no way of knowing how far recovery progressed onto sdh, so you need
to exclude it.  With v1.x metadata we would know ... but it wouldn't really
help the much.

Your only option is to do a --force assemble of the other devices.
sde is a little bit out of date, but it cannot be much out of date as the
array would have stopped handling writes as soon as it failed.

This will assemble the array degraded.  You should then 'fsck' and do
anything else to check that the data is OK.

Then you need to check that all your drives and are your system are good (if
you haven't already), then add a good drive as a spare and let it rebuild.

NeilBrown


> 
> Below is the ouput of "mdadm --examine" for the drives in the raid 5 array.
> 
> BTW, don't know if it matters but the system is running an older  
> debian (lenny?) with a 2.6.32 backport kernel, mdadm version is 2.6.7.2.
> 
> Best Regards,
> Peter
> 
> 
> > mdadm --examine /dev/sd?
> 
> /dev/sdd:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
>    Creation Time : Thu Jun 24 15:12:41 2010
>       Raid Level : raid5
>    Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
>       Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
>     Raid Devices : 5
>    Total Devices : 5
> Preferred Minor : 1
> 
>      Update Time : Wed Oct  9 20:29:41 2013
>            State : clean
>   Active Devices : 3
> Working Devices : 4
>   Failed Devices : 1
>    Spare Devices : 1
>         Checksum : 3dc0af1a - correct
>           Events : 1288444
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>        Number   Major   Minor   RaidDevice State
> this     1       8       48        1      active sync   /dev/sdd
> 
>     0     0       0        0        0      removed
>     1     1       8       48        1      active sync   /dev/sdd
>     2     2       8       80        2      active sync   /dev/sdf
>     3     3       0        0        3      faulty removed
>     4     4       8       96        4      active sync   /dev/sdg
>     5     5       8      112        5      spare   /dev/sdh
> 
> 
> /dev/sde:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
>    Creation Time : Thu Jun 24 15:12:41 2010
>       Raid Level : raid5
>    Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
>       Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
>     Raid Devices : 5
>    Total Devices : 5
> Preferred Minor : 1
> 
>      Update Time : Tue Oct  8 03:26:05 2013
>            State : clean
>   Active Devices : 4
> Working Devices : 5
>   Failed Devices : 1
>    Spare Devices : 1
>         Checksum : 3dbe6d93 - correct
>           Events : 1288428
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>        Number   Major   Minor   RaidDevice State
> this     0       8       64        0      active sync   /dev/sde
> 
>     0     0       8       64        0      active sync   /dev/sde
>     1     1       8       48        1      active sync   /dev/sdd
>     2     2       8       80        2      active sync   /dev/sdf
>     3     3       0        0        3      faulty removed
>     4     4       8       96        4      active sync   /dev/sdg
>     5     5       8      112        5      spare   /dev/sdh
> 
> 
> /dev/sdf:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
>    Creation Time : Thu Jun 24 15:12:41 2010
>       Raid Level : raid5
>    Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
>       Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
>     Raid Devices : 5
>    Total Devices : 5
> Preferred Minor : 1
> 
>      Update Time : Wed Oct  9 20:29:41 2013
>            State : clean
>   Active Devices : 3
> Working Devices : 4
>   Failed Devices : 1
>    Spare Devices : 1
>         Checksum : 3dc0af3c - correct
>           Events : 1288444
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>        Number   Major   Minor   RaidDevice State
> this     2       8       80        2      active sync   /dev/sdf
> 
>     0     0       0        0        0      removed
>     1     1       8       48        1      active sync   /dev/sdd
>     2     2       8       80        2      active sync   /dev/sdf
>     3     3       0        0        3      faulty removed
>     4     4       8       96        4      active sync   /dev/sdg
>     5     5       8      112        5      spare   /dev/sdh
> 
> 
> /dev/sdg:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
>    Creation Time : Thu Jun 24 15:12:41 2010
>       Raid Level : raid5
>    Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
>       Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
>     Raid Devices : 5
>    Total Devices : 5
> Preferred Minor : 1
> 
>      Update Time : Wed Oct  9 20:29:41 2013
>            State : clean
>   Active Devices : 3
> Working Devices : 4
>   Failed Devices : 1
>    Spare Devices : 1
>         Checksum : 3dc0af50 - correct
>           Events : 1288444
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>        Number   Major   Minor   RaidDevice State
> this     4       8       96        4      active sync   /dev/sdg
> 
>     0     0       0        0        0      removed
>     1     1       8       48        1      active sync   /dev/sdd
>     2     2       8       80        2      active sync   /dev/sdf
>     3     3       0        0        3      faulty removed
>     4     4       8       96        4      active sync   /dev/sdg
>     5     5       8      112        5      spare   /dev/sdh
> 
> 
> /dev/sdh:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
>    Creation Time : Thu Jun 24 15:12:41 2010
>       Raid Level : raid5
>    Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
>       Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
>     Raid Devices : 5
>    Total Devices : 5
> Preferred Minor : 1
> 
>      Update Time : Wed Oct  9 20:29:41 2013
>            State : clean
>   Active Devices : 3
> Working Devices : 4
>   Failed Devices : 1
>    Spare Devices : 1
>         Checksum : 3dc0af5c - correct
>           Events : 1288444
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>        Number   Major   Minor   RaidDevice State
> this     5       8      112        5      spare   /dev/sdh
> 
>     0     0       0        0        0      removed
>     1     1       8       48        1      active sync   /dev/sdd
>     2     2       8       80        2      active sync   /dev/sdf
>     3     3       0        0        3      faulty removed
>     4     4       8       96        4      active sync   /dev/sdg
>     5     5       8      112        5      spare   /dev/sdh
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux