Hey all, So the nightmare came for me - I've a 7x2TB setup under RAID6, and one of the drives started showing uncorrectable sectors a few days ago, but I didn't yet have time to address that. I had two-disk redundancy, after all... Soon thereafter the cables / controller spew a slew of errors and the array was stopped. A --force --assemble later it was back up, rebuilding onto 2 spares - I was left with no redundancy. If only the bad sectors drive was one of those two, everything would be fine. Unfortunately that's not the case, so I'm now left with an array with read errors. So it fails during rebuild due to those. What I'd like to do first is to make sure the array rebuilds onto the 6 healthy drives, regardless of the bad blocks, I can probably recover the data (assuming I can find out which files were affected - any pointers?), but if the array doesn't rebuild correctly, I'm afraid it's gonna get worse, and soon. I could probably use the data from the two spares to correct the few broken blocks, but it's probably not worth it - I'd rather have a working array with a few bad files than to fight with an unprotected array. Please find some details about my array below, and let me know if I can supply more. As a side note... I've a full array scrub enabled on the array every now and again - and it did run after the disk started failing blocks, but they never got reallocated, they all remain pending / uncorrectable. Is that expected? > # mdadm --examine /dev/sda > /dev/sda: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : ff9e032c:446ed0bd:fc9473f3:f8e090ed > Name : media:store (local to host media) > Creation Time : Tue Sep 13 21:36:43 2011 > Raid Level : raid6 > Raid Devices : 7 > > Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB) > Array Size : 19535119360 (9315.07 GiB 10001.98 GB) > Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 3b40e74a:c9b652ce:6810bdcd:d2648b69 > > Update Time : Tue Oct 1 01:06:35 2013 > Checksum : a0ddd145 - correct > Events : 753179 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 3 > Array State : AAAAAAA ('A' == active, '.' == missing) > # cat /proc/mdstat > Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] > md126 : active raid6 sdg[9] sdb[8] sdh1[6] sdc[7] sdf1[5] sda[10] sdi[11] > 9767559680 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/5] [UU_UUU_] > [========>............] recovery = 44.4% (867604356/1953511936) finish=605.9min speed=29866K/sec Thanks and best regards, -- Michał (Saviq) Sawicz <michal@xxxxxxxxxx>
Attachment:
signature.asc
Description: OpenPGP digital signature