Re: Help with recovering a RAID5 array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2 May 2013 13:24, Stefan Borggraefe <stefan@xxxxxxxxxxx> wrote:
> Hi,
>
> I am using a RAID5 software RAID on Ubuntu 12.04 (kernel
> 3.2.0-37-generic x86_64).
>
> It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file system.
> There are no spare devices.
>
> Yesterday evening I exchanged a drive that showed SMART errors and the
> array started rebuilding its redundancy normally.
>
> When I returned to this server this morning, the array was in the following
> state:
>
> md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
>       19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
> [U_U_UU]
>
> sdc is the newly added hard disk, but now also sdd failed. :( It would be
> great if there was a way to have the this RAID5 working again. Perhaps sdc1
> can then be fully added to the array and after this drive sdd also exchanged.
>
> I have not started experimenting or changing this array in any way, but wanted
> to ask here for assistance first. Thank you for your help!
>
> mdadm --examine /dev/sd[cdegfh]1 | egrep 'Event|/dev/sd'
>
> shows
>
> /dev/sdc1:
>          Events : 494
> /dev/sdd1:
>          Events : 478
> /dev/sde1:
>          Events : 494
> /dev/sdf1:
>          Events : 494
> /dev/sdg1:
>          Events : 494
> /dev/sdh1:
>          Events : 494
>
>
>
> mdadm --examine /dev/sd[cdegfh]1
>
> showsThank you for your help! :)
>
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 7433213e:0dd2e5ed:073dd59d:bf1f83d8
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : 9e83f72 - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : spare
>    Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : c2e5423f:6d91a061:c3f55aa7:6d1cec87
>
>     Update Time : Mon Apr 29 17:24:26 2013
>        Checksum : 37b97776 - correct
>          Events : 478
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 3
>    Array State : AAAAAA ('A' == active, '.' == missing)
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 68207885:02c05297:8ef62633:65b83839
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : f0b36c7f - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 0
>    Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdf1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 7d328a98:6c02f550:ab1837c0:cb773ac1
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : d2799f34 - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 2
>    Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdg1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 76b683b1:58e053ff:57ac0cfc:be114f75
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : 89bc2e05 - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 5
>    Array State : A.A.AA ('A' == active, '.' == missing)
> /dev/sdh1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 13051471:fba5785f:4365dea1:0670be37
>            Name : teraturm:2  (local to host teraturm)
>   Creation Time : Tue Feb  5 14:23:06 2013
>      Raid Level : raid5
>    Raid Devices : 6
>
>  Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
>      Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
>   Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 3c88705f:9f3add0e:d58d46a7:b40d02d7
>
>     Update Time : Tue Apr 30 10:06:55 2013
>        Checksum : 541f3913 - correct
>          Events : 494
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 4
>    Array State : A.A.AA ('A' == active, '.' == missing)
>
> This is the dmesg output from when the failure happened:
>
> [6669459.855352] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855362] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855368] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 2a 00 00 08
> 00
> [6669459.855387] end_request: I/O error, dev sdd, sector 590910506
> [6669459.855456] raid5_end_read_request: 14 callbacks suppressed
> [6669459.855463] md/raid:md126: read error not correctable (sector 590910472
> on sdd1).
> [6669459.855490] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855496] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855501] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 32 00 00 08
> 00
> [6669459.855515] end_request: I/O error, dev sdd, sector 590910514
> [6669459.855594] md/raid:md126: read error not correctable (sector 590910480
> on sdd1).
> [6669459.855608] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855611] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855620] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 3a 00 00 08
> 00
> [6669459.855648] end_request: I/O error, dev sdd, sector 590910522
> [6669459.855710] md/raid:md126: read error not correctable (sector 590910488
> on sdd1).
> [6669459.855720] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855723] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855727] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 42 00 00 08
> 00
> [6669459.855737] end_request: I/O error, dev sdd, sector 590910530
> [6669459.855796] md/raid:md126: read error not correctable (sector 590910496
> on sdd1).
> [6669459.855814] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855817] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855821] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 4a 00 00 08
> 00
> [6669459.855831] end_request: I/O error, dev sdd, sector 590910538
> [6669459.855889] md/raid:md126: read error not correctable (sector 590910504
> on sdd1).
> [6669459.855907] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855910] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855914] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 52 00 00 08
> 00
> [6669459.855924] end_request: I/O error, dev sdd, sector 590910546
> [6669459.855982] md/raid:md126: read error not correctable (sector 590910512
> on sdd1).
> [6669459.855990] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.855992] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.855996] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 5a 00 00 08
> 00
> [6669459.856004] end_request: I/O error, dev sdd, sector 590910554
> [6669459.856062] md/raid:md126: read error not correctable (sector 590910520
> on sdd1).
> [6669459.856072] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856075] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856079] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 62 00 00 08
> 00
> [6669459.856088] end_request: I/O error, dev sdd, sector 590910562
> [6669459.856153] md/raid:md126: read error not correctable (sector 590910528
> on sdd1).
> [6669459.856171] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856174] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 6a 00 00 08
> 00
> [6669459.856188] end_request: I/O error, dev sdd, sector 590910570
> [6669459.856256] md/raid:md126: read error not correctable (sector 590910536
> on sdd1).
> [6669459.856265] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856268] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856272] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 72 00 00 08
> 00
> [6669459.856281] end_request: I/O error, dev sdd, sector 590910578
> [6669459.856346] md/raid:md126: read error not correctable (sector 590910544
> on sdd1).
> [6669459.856364] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856368] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856374] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 7a 00 00 08
> 00
> [6669459.856385] end_request: I/O error, dev sdd, sector 590910586
> [6669459.856445] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856449] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856456] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 82 00 00 08
> 00
> [6669459.856466] end_request: I/O error, dev sdd, sector 590910594
> [6669459.856526] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856530] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856537] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 8a 00 00 08
> 00
> [6669459.856547] end_request: I/O error, dev sdd, sector 590910602
> [6669459.856607] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856611] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856617] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 92 00 00 08
> 00
> [6669459.856628] end_request: I/O error, dev sdd, sector 590910610
> [6669459.856687] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856691] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856697] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 9a 00 00 08
> 00
> [6669459.856707] end_request: I/O error, dev sdd, sector 590910618
> [6669459.856767] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856772] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856778] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 a2 00 00 08
> 00
> [6669459.856788] end_request: I/O error, dev sdd, sector 590910626
> [6669459.856847] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856851] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856859] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 aa 00 00 08
> 00
> [6669459.856869] end_request: I/O error, dev sdd, sector 590910634
> [6669459.856928] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.856932] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.856938] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 b2 00 00 08
> 00
> [6669459.856949] end_request: I/O error, dev sdd, sector 590910642
> [6669459.857008] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857011] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857018] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ba 00 00 08
> 00
> [6669459.857028] end_request: I/O error, dev sdd, sector 590910650
> [6669459.857088] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857092] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857098] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 c2 00 00 08
> 00
> [6669459.857109] end_request: I/O error, dev sdd, sector 590910658
> [6669459.857168] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857171] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ca 00 00 08
> 00
> [6669459.857188] end_request: I/O error, dev sdd, sector 590910666
> [6669459.857248] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857251] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857258] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 d2 00 00 08
> 00
> [6669459.857269] end_request: I/O error, dev sdd, sector 590910674
> [6669459.857328] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857333] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857339] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 da 00 00 08
> 00
> [6669459.857349] end_request: I/O error, dev sdd, sector 590910682
> [6669459.857408] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857412] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857418] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 e2 00 00 08
> 00
> [6669459.857429] end_request: I/O error, dev sdd, sector 590910690
> [6669459.857488] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857492] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857499] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 93 4a 00 00 08
> 00
> [6669459.857509] end_request: I/O error, dev sdd, sector 590910282
> [6669459.857569] sd 6:1:10:0: [sdd] Unhandled error code
> [6669459.857573] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
> driverbyte=DRIVER_OK
> [6669459.857579] sd 6:1:10:0: [sdd] CDB:
> [6669459.857585] aacraid: Host adapter abort request (6,1,10,0)
> [6669459.857639] Read(10): 28 00 23 38 93 42 00 00 08 00
> [6669459.857648] end_request: I/O error, dev sdd, sector 590910274
> [6669459.857844] aacraid: Host adapter reset request. SCSI hang ?
> [6669470.028090] RAID conf printout:
> [6669470.028097]  --- level:5 rd:6 wd:4
> [6669470.028101]  disk 0, o:1, dev:sde1
> [6669470.028105]  disk 1, o:1, dev:sdc1
> [6669470.028109]  disk 2, o:1, dev:sdf1
> [6669470.028112]  disk 3, o:0, dev:sdd1
> [6669470.028115]  disk 4, o:1, dev:sdh1
> [6669470.028118]  disk 5, o:1, dev:sdg1
> [6669470.034462] RAID conf printout:
> [6669470.034464]  --- level:5 rd:6 wd:4
> [6669470.034465]  disk 0, o:1, dev:sde1
> [6669470.034466]  disk 2, o:1, dev:sdf1
> [6669470.034467]  disk 3, o:0, dev:sdd1
> [6669470.034468]  disk 4, o:1, dev:sdh1
> [6669470.034469]  disk 5, o:1, dev:sdg1
> [6669470.034484] RAID conf printout:
> [6669470.034486]  --- level:5 rd:6 wd:4
> [6669470.034489]  disk 0, o:1, dev:sde1
> [6669470.034491]  disk 2, o:1, dev:sdf1
> [6669470.034494]  disk 3, o:0, dev:sdd1
> [6669470.034496]  disk 4, o:1, dev:sdh1
> [6669470.034499]  disk 5, o:1, dev:sdg1
> [6669470.034571] RAID conf printout:
> [6669470.034577]  --- level:5 rd:6 wd:4
> [6669470.034581]  disk 0, o:1, dev:sde1
> [6669470.034584]  disk 2, o:1, dev:sdf1
> [6669470.034587]  disk 4, o:1, dev:sdh1
> [6669470.034589]  disk 5, o:1, dev:sdg1
>
> Please let me know if you need any more information.
> --
> Best regards,
> Stefan Borggraefe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


I won't scold you for using RAID5 instead of RAID6 with this number of
if drives and especially the size of the drives.

Could you please post the output of smartctl -a for each device? (from
smartmontools)

That way we can verify which HDDs are broken, before proceeding.

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux