On 20/12/12 11:03, Roger Heflin wrote:
On Sun, Dec 2, 2012 at 6:04 PM, Tudor Holton <tudor@xxxxxxxxxxxxxxxxx> wrote:
Hallo,
I'm having some trouble with an array I have that has become degraded.
I have an array with this array state:
md101 : active raid1 sdf1[0] sdb1[2](S)
1953511936 blocks [2/1] [U_]
mdadm --detail says:
/dev/md101:
Version : 0.90
Creation Time : Thu Jan 13 14:34:27 2011
Raid Level : raid1
Array Size : 1953511936 (1863.01 GiB 2000.40 GB)
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 101
Persistence : Superblock is persistent
Update Time : Fri Nov 23 03:23:04 2012
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
UUID : 43e92a79:90295495:0a76e71e:56c99031 (local to host barney)
Events : 0.2127
Number Major Minor RaidDevice State
0 8 81 0 active sync /dev/sdf1
1 0 0 1 removed
2 8 17 - spare /dev/sdb1
If I attempt to force the spare to become active it begins to recover:
$ sudo mdadm -S /dev/md101
mdadm: stopped /dev/md101
$ sudo mdadm --assemble --force --no-degraded /dev/md101 /dev/sdf1 /dev/sdb1
mdadm: /dev/md101 has been started with 1 drive (out of 2) and 1 spare.
$ cat /proc/mdstat
md101 : active raid1 sdf1[0] sdb1[2]
1953511936 blocks [2/1] [U_]
[>....................] recovery = 0.0% (541440/1953511936)
finish=420.8min speed=77348K/sec
This runs for the allotted time but returns to the state of spare.
Neither disk partition report errors:
$ cat /sys/block/md101/md/dev-sdf1/errors
0
$ cat /sys/block/md101/md/dev-sdb1/errors
0
Are there mdadm logs to find out why this is not recovering properly? How
otherwise do I debug this?
Cheers,
Tudor.
Did you look in the various /var/log/messages (current and previous
ones) to see what it indicated happened the about the time it
completed?
There is almost certainly something in there indicating what went wrong.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks. I watched the logs messages during the recovery. During the
last 0.1% (at 99.9%) messages like this appeared:
Dec 24 18:20:32 barney kernel: [2796835.703313] sd 2:0:0:0: [sdf]
Unhandled sense code
Dec 24 18:20:32 barney kernel: [2796835.703316] sd 2:0:0:0: [sdf]
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 24 18:20:32 barney kernel: [2796835.703320] sd 2:0:0:0: [sdf] Sense
Key : Medium Error [current] [descriptor]
Dec 24 18:20:32 barney kernel: [2796835.703325] Descriptor sense data
with sense descriptors (in hex):
Dec 24 18:20:32 barney kernel: [2796835.703327] 72 03 11 04 00
00 00 0c 00 0a 80 00 00 00 00 00
Dec 24 18:20:32 barney kernel: [2796835.703335] e8 e0 5f 86
Dec 24 18:20:32 barney kernel: [2796835.703339] sd 2:0:0:0: [sdf] Add.
Sense: Unrecovered read error - auto reallocate failed
Dec 24 18:20:32 barney kernel: [2796835.703345] sd 2:0:0:0: [sdf] CDB:
Read(10): 28 00 e8 e0 5f 7f 00 00 08 00
Dec 24 18:20:32 barney kernel: [2796835.703353] end_request: I/O error,
dev sdf, sector 3907018630
Dec 24 18:20:32 barney kernel: [2796835.703366] ata3: EH complete
Dec 24 18:20:32 barney kernel: [2796835.703383] md/raid1:md101: sdf:
unrecoverable I/O read error for block 3907018496
Unfortunately, sdf is the active disk in this case. So I guess my only
option left is to create a new array and copy as much over as it will
let me?
Cheers,
Tudor.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html