I've got a simple setup with three IDE drives where two disks share a
30mb RAID1 partition for /boot and all three share a 590GB RAID5
array for /
My mdadm.conf looks like this:
DEVICE partitions
ARRAY /dev/md1 level=raid5 num-devices=3 UUID=4b22b17d:
06048bd3:ecec156c:31fabbaf
devices=/dev/hda3,/dev/hdc3,/dev/hdg2
ARRAY /dev/md0 level=raid1 num-devices=2
UUID=7d5c8486:35fff755:f5d34fc2:a12f1f81
devices=/dev/hda1,/dev/hdc1
The UUIDs check out with the devices, and indeed /dev/md0 works
fine. /dev/md1 used to work perfectly, but read on :-p
All the raid partitions are type 0xfd RAID auto-detect.
Recently I had to replace hdc because it crashed. When I got the new
drive, I copied the partition table from hda (using cfdisk) and
hotadded it to md0 and md1.
The problem is that /dev/md1 starts without /dev/hdc3 whenever I boot
the system, so I have to resynchronize each time.
The raid info for /dev/hda3 and /dev/hdg2 is the same, that is
/dev/hda3:
Magic : a92b4efc
Version : 00.90.00
UUID : 4b22b17d:06048bd3:ecec156c:31fabbaf
Creation Time : Tue Jun 7 13:03:54 2005
Raid Level : raid5
Raid Devices : 3
Total Devices : 2
Preferred Minor : 1
Update Time : Mon Nov 7 23:28:38 2005
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Checksum : b0ce8bf5 - correct
Events : 0.366671
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 3 3 0 active sync /dev/hda3
0 0 3 3 0 active sync /dev/hda3
1 1 0 0 1 faulty removed
2 2 34 2 2 active sync /dev/hdg2
/dev/hdc3 doesn't agree to this - it shows all drives as being online.
I just tried rebooting during a synchronization (had to move the
computer), and the state of /dev/hdc3 is now:
/dev/hdc3:
Magic : a92b4efc
Version : 00.90.00
UUID : 4b22b17d:06048bd3:ecec156c:31fabbaf
Creation Time : Tue Jun 7 13:03:54 2005
Raid Level : raid5
Raid Devices : 3
Total Devices : 3
Preferred Minor : 1
Update Time : Mon Nov 7 23:23:24 2005
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : b0ce8a68 - correct
Events : 0.366603
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 22 3 3 spare /dev/hdc3
0 0 3 3 0 active sync /dev/hda3
1 1 0 0 1 faulty removed
2 2 34 2 2 active sync /dev/hdg2
3 3 22 3 3 spare /dev/hdc3
...but it's not synching.
/proc/mdstat shows
Personalities : [raid1] [raid5]
md0 : active raid1 hda1[0] hdc1[1]
48064 blocks [2/2] [UU]
md1 : active raid5 hda3[0] hdg2[2]
585585152 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]
unused devices: <none>
Note that md0, although using hdc, doesn't have any problems and that
hdc doesn't show up as spare on md1.
All three drives are the same model, and they're less than half a
year old.
dmesg says the following:
...
devfs_mk_dev: could not append to parent for md/1
md: md1 stopped.
md: bind<hdg2>
md: bind<hda3>
raid5: device hda3 operational as raid disk 0
raid5: device hdg2 operational as raid disk 2
raid5: allocated 3164kB for md1
raid5: raid level 5 set md1 active with 2 out of 3 devices, algorithm 2
RAID5 conf printout:
--- rd:3 wd:2 fd:1
disk 0, o:1, dev:hda3
disk 2, o:1, dev:hdg2
I'm a little unsure about that "could not append to parent" part.
Maybe that's the culprit somehow? Then md0 should also be broken
since its output is
devfs_mk_dev: could not append to parent for md/0
md: md0 stopped.
md: bind<hdc1>
md: bind<hda1>
md: raid1 personality registered as nr 3
raid1: raid set md0 active with 2 out of 2 mirrors
...but it works perfectly.
My thoughts about possible explanations are:
-md drops hdc3 silently at boot ofr some reason. I believe this would
constitute a grave bug
-perhaps hdc3 has weird information in the raid superblock - I've
tried zeroing it before adding though.
-hda3 or hdg2 has information in their superblock that sets that
drive as faulty and that information doesn't get reset after a sync
I've seen two or three posts concerning what seems to me to be the
same problem when I searched through the mailing list archive - I
just tried but could only find one, with the subject line:
RAID-1 mirror keeps mysteriously dropping one partition on boot
I'm running a Debian Sarge system with a 2.6.12-1-k7 stock kernel
(taken from unstable).
mdadm is version 1.9.0 (04 feb. 2005)
I'm all out of ideas atm, so any pointers at all would be greatly
appreciated.
Troels
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html