fred smith wrote:
hi all!
back in Aug several of you assisted me in solving a problem where one
of my drives had dropped out of (or been kicked out of) the raid1 array.
something vaguely similar appears to have happened just a few mins ago,
upon rebooting after a small update. I received four emails like this,
one for /dev/md0, one for /dev/md1, one for /dev/md125 and one for
/dev/md126:
Subject: DegradedArray event on /dev/md125:fcshome.stoneham.ma.us
X-Spambayes-Classification: unsure; 0.24
Status: RO
Content-Length: 564
Lines: 23
This is an automatically generated mail message from mdadm
running on fcshome.stoneham.ma.us
A DegradedArray event had been detected on md device /dev/md125.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1]
md0 : active raid1 sda1[0]
104320 blocks [2/1] [U_]
md126 : active raid1 sdb1[1]
104320 blocks [2/1] [_U]
md125 : active raid1 sdb2[1]
312464128 blocks [2/1] [_U]
md1 : active raid1 sda2[0]
312464128 blocks [2/1] [U_]
unused devices: <none>
firstly, what the heck are md125 and md126? previously there was
only md0 and md1.... ????
secondly, I'm not sure what it's trying to tell me. it says there was a
"degradedarray event" but at the bottom it says there are no unused devices.
there are also some messages in /var/log/messages from the time of the
boot earlier today, but they do NOT say anything about "kicking out"
any of the md member devices (as they did in the event back in August):
Oct 19 18:29:41 fcshome kernel: device-mapper: dm-raid45: initialized v0.2594l
Oct 19 18:29:41 fcshome kernel: md: Autodetecting RAID arrays.
Oct 19 18:29:41 fcshome kernel: md: autorun ...
Oct 19 18:29:41 fcshome kernel: md: considering sdb2 ...
Oct 19 18:29:41 fcshome kernel: md: adding sdb2 ...
Oct 19 18:29:41 fcshome kernel: md: sdb1 has different UUID to sdb2
Oct 19 18:29:41 fcshome kernel: md: sda2 has same UUID but different superblock
to sdb2
This appears to be the cause
Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sdb2
Oct 19 18:29:41 fcshome kernel: md: created md125
this was auto created - I've not experienced this myself and run half a
dozen of these on different machines.
Oct 19 18:29:41 fcshome kernel: md: bind<sdb2>
Oct 19 18:29:41 fcshome kernel: md: running: <sdb2>
Oct 19 18:29:41 fcshome kernel: raid1: raid set md125 active with 1 out of 2 mir
rors
now it has mounted it separately
Oct 19 18:29:41 fcshome kernel: md: considering sdb1 ...
Oct 19 18:29:41 fcshome kernel: md: adding sdb1 ...
Oct 19 18:29:41 fcshome kernel: md: sda2 has different UUID to sdb1
Oct 19 18:29:41 fcshome kernel: md: sda1 has same UUID but different superblock
to sdb1
and now for the second one
Oct 19 18:29:41 fcshome kernel: md: created md126
Oct 19 18:29:41 fcshome kernel: md: bind<sdb1>
Oct 19 18:29:41 fcshome kernel: md: running: <sdb1>
Oct 19 18:29:41 fcshome kernel: raid1: raid set md126 active with 1 out of 2 mirrors
Oct 19 18:29:41 fcshome kernel: md: considering sda2 ...
Oct 19 18:29:41 fcshome kernel: md: adding sda2 ...
Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sda2
Oct 19 18:29:41 fcshome kernel: md: created md1
Oct 19 18:29:41 fcshome kernel: md: bind<sda2>
Oct 19 18:29:41 fcshome kernel: md: running: <sda2>
Oct 19 18:29:41 fcshome kernel: raid1: raid set md1 active with 1 out of 2 mirrors
Oct 19 18:29:41 fcshome kernel: md: considering sda1 ...
Oct 19 18:29:41 fcshome kernel: md: adding sda1 ...
Oct 19 18:29:41 fcshome kernel: md: created md0
Oct 19 18:29:41 fcshome kernel: md: bind<sda1>
Oct 19 18:29:41 fcshome kernel: md: running: <sda1>
Oct 19 18:29:41 fcshome kernel: raid1: raid set md0 active with 1 out of 2 mirrors
Oct 19 18:29:41 fcshome kernel: md: ... autorun DONE.
and here's /etc/mdadm.conf:
# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR fredex
ARRAY /dev/md0 level=raid1 num-devices=2 uuid=4eb13e45:b5228982:f03cd503:f935bd69
ARRAY /dev/md1 level=raid1 num-devices=2 uuid=5c79b138:e36d4286:df9cf6f6:62ae1f12
which doesn't say anything about md125 or md126,... might they be some kind of detritus
or fragments left over from whatever kind of failure caused the array to become degraded?
now you need to decide (by looking at each device (may need to mount
first.)) which is the correct master.
remove the other one and add it back to the original array - it will
them rebuild.
If these are SATA drives just check the cable - I have one machine where
they work loose and cause failures.
do ya suppose a boot from power-off might somehow give it a whack upside the head so
it'll reassemble itself according to mdadm.conf?
doubt it - see the above dmesg.
I'm not sure which devices need to be failed and re-added to make it clean again (which
is all I had to do when I had the aforementioned earlier problem.)
Thanks in advance for any advice!
Fred
begin:vcard
fn:Rob Kampen
n:Kampen;Rob
email;internet:rkampen@xxxxxxxxxxxxxxxxx
tel;work:407-876-4108 x6344
tel;fax:407-876-3591
tel;home:407-876-4854
tel;cell:407-341-3815
note:Licensed REALTOR, CPM Candidate, and Licensed Community Association Manager.
version:2.1
end:vcard
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos