whollygoat@xxxxxxxxxxxxxxx wrote:
On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david@xxxxxxxxxxxx>
said:
whollygoat@xxxxxxxxxxxxxxx wrote:
On a boot a couple of days ago, mdadm failed a disk and
started resyncing to spare (raid5, 6 drives, 5 active, 1
spare). smartctl -H <disk> returned info (can't remember
the exact text) that made me suspect the drive was
fine, but the data connection was bad. Sure enough the
data cable was damaged. Replaced the cable and smartctl
sees the disk just fine and reports no errors.
- I'd like to readd the drive as a spare. Is it enough
to "mdadm --add /dev/hdk" or do I need to prep the drive to
remove any data that said where it previously belonged
in the array?
That should work.
Any issues and you can zero the superblock (man mdadm)
No need to zero the disk.
Would --re-add be better?
I don't think do. And I would zero the superblock. The more detail you
put into preventing unwanted autodetection the fewer learning
experiences you will have.
I've noticed something else since I made the initial post
--------- begin output -------------
fly:~# mdadm -D /dev/md0
/dev/md0:
Version : 01.00.03
Creation Time : Sun Jan 11 21:49:36 2009
Raid Level : raid5
Array Size : 312602368 (298.12 GiB 320.10 GB)
Device Size : 156301184 (74.53 GiB 80.03 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Jan 30 15:52:01 2009
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : fly:FlyFileServ_md (local to host fly)
UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
Events : 16
Number Major Minor RaidDevice State
0 33 1 0 active sync /dev/hde1
1 34 1 1 active sync /dev/hdg1
2 56 1 2 active sync /dev/hdi1
5 89 1 3 active sync /dev/hdo1
6 88 1 4 active sync /dev/hdm1
fly:~# mdadm -E /dev/hdo1
/dev/hdo1:
Magic : a92b4efc
Version : 01
Feature Map : 0x1
Array UUID : 0e2b9157:a58edc1d:213a220f:68a555c9
Name : fly:FlyFileServ_md (local to host fly)
Creation Time : Sun Jan 11 21:49:36 2009
Raid Level : raid5
Raid Devices : 5
Device Size : 234436336 (111.79 GiB 120.03 GB)
Array Size : 625204736 (298.12 GiB 320.10 GB)
Used Size : 156301184 (74.53 GiB 80.03 GB)
Super Offset : 234436464 sectors
State : clean
Device UUID : e072bd09:2df53d6d:d23321cc:cf2c37de
Internal Bitmap : 2 sectors from superblock
Update Time : Fri Jan 30 15:52:01 2009
Checksum : 4689ff5 - correct
Events : 16
Layout : left-symmetric
Chunk Size : 64K
Array Slot : 5 (0, 1, 2, failed, failed, 3, 4)
Array State : uuuUu 2 failed
--------- end output -------------
Why does the "Array Slot" field show 7 slots? And why
does the field "Array State" show 2 failed? There
ever only were 6 disks in the array. Only one of those
is currently missing. mdadm -D above doesn't list any
failed devices in the "Failed Devices" field.
No idea, but did you explicitly remove the failed drive? Was there a
failed drive at some time in the past?
I've never seen this, but I always remove drives, which may or may not
be related.
Thanks for your answers below as well. It's kind of
what I was expecting. There was a h/w problem that
took ages to track down and I think it was reponsible
for all the e2fs errors.
WG
- When I tried to list some files on one of the filesystems
on the array (the fact that it took so long to react to
the ls is how I discovered the box was in the middle of
rebuiling to spare)
This is OK - resync involves a lot of IO and can slow things down. This
is tuneable.
it couldn't find the file (or many
others). I thought that resyncing was supposed to be
transparent, yet parts of the fs seemed to be missing.
Everything was there afterwards. Is that normal?
No. This is nothing to do with normal md resyncing and certainly not
expected.
- On a subsequent boot I had to run e2fsck on the three
filesystems housed on the array. Many stray blocks,
illegal inodes, etc were found. An artifact of the rebuild
or unrelated?
Well, you had a fault in your IO system there's a good chance your O
broke.
Verify against a backup.
David
--
"Don't worry, you'll be fine; I saw it work in a cartoon once..."
--
Bill Davidsen <davidsen@xxxxxxx>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html