RAID5 recovering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Raid experts,

I have a Raid5 volume that recently crashed and I need you advices before doing some irreversible action.

Let me first summarize the past and current state.

1) I had a nicely running RAID5 volume with 3 x 1 To disks (LVM on top and several LVM volumes in ext3 and axt4) but volume was now a bit too small and I decided to add a new 1 To disk.

2) I added a new disk and did not do anything for a couple of days (Raid still running with 3 disks)

3) One of the old disk failed and was ejected from the RAID.

4) The ejected disk was not even present as /dev/sdX. I thus tested the connections and the disk came back.

5) I resync the ejected disk and I was back with my original 3 disk array.

6) I waited 2-3 days and everything was fine. I then added the new disk and resync.

7) I had now a running 4 disk RAID5 array, I created a new volume and started copying on it.

8) During the week-end, 2 disks were ejected from the array, the new installed one and the same than previously (step 3)

9) Again the 2 disks were not present in /dev/sdX. I thus checked again the connections and the problem was a molex connector. The two ejected disks were on the same molex and this explains why both were detected as faulty.

Now, my list of errors as a newbie.

4) I did not save all the informations before proceeding (mdadm --examine, /etc/mdadm/mdadm.conf, syslog, ...)

5) I tried to assemble the disks with
mdadm --assemble --scan
with no result

6) I thus tried and this is my big error I think !!!
mdadm --assemble /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

I forgot in this command /dev/md0 after assemble.
Because of this /dev/sdb1 suberblock was removed and now mdadm--examine /dev/sdb1 returns "No md superblock detected on /dev/sdb1"

I would like now to be more cautious. If some nice expert from the list would be nice enough to tell me if the proposed method described below is the right approach I will be grateful for the rest of my life :-)

7) I read the RAID wiki and the list.

8) I saved
mdadm --examine /dev/sd[bcde]1
dmesg
syslog
/etc/mdadm/mdadm.conf
fdisk -lu /dev/sd[bcde]

I put the content of this files at the end of this message (except dmesg and syslog because they are very long).

9) /dev/sdd is the new disk. This is clear in the fdisk listing since it is a 4K sector disk.
The normal order of the raid is thus (see mdadm --examine /dev/sd[de]1)
sdb1 sdc1 sde1 sdd1

10) Events are
/dev/sdb1: no md superblock (see 6)
/dev/sdc1: Events : 112358
/dev/sdd1: Events : 112333
/dev/sde1: Events : 112358

It seems that sdd was the first disk removed.
Presumably sdb1 is in sync since it was running with sdc1 when the sdd1 and sde1 were ejected from the array (see 8) but I can't be sure since I stupidly erased its superblock!

11) I propose to re-create the array with the --assume-clean option, then check everything using "fsck -n" and "mount -o ro"
the command would be:

mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
--chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdd1

however, since sdd1 is not really in sync since its event count is a bit lower I could also just try

mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
--chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 missing

however, I'm not completely sure for sdb1 since it does not have a superblock I could also try

mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
--chunk=64 --size=976759936 missing /dev/sdc1 /dev/sde1 /dev/sdd1

Would you use the 4 disks as in the first command or do you think that the 20 event difference is a big problem?
If it works, what it the best way to test that everything is ok?

Thanks a lot for your help.

------------------------------- /etc/mdadm/mdadm.conf ----------------------------------------------
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE /dev/sd*1 /dev/sdf1 /dev/sdc1 /dev/sdd1

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=760291c6:73cd6884:c91d1289:ceb97d9c

# This file was auto-generated on Wed, 04 Mar 2009 17:10:18 +0100
# by mkconf $Id$

--------------------------------- mdadm --examine /dev/sd[bcde]1 ----------------------------------------------------------
#mdadm --examine /dev/sd[bcde]1

mdadm: No md superblock detected on /dev/sdb1.

/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 760291c6:73cd6884:c91d1289:ceb97d9c (local to host backup)
Creation Time : Wed Mar 4 17:13:19 2009
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Update Time : Thu Apr 11 03:03:18 2013
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 2
Spare Devices : 0
Checksum : 2329d0 - correct
Events : 112358

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1

0 0 8 17 0 active sync
1 1 8 33 1 active sync /dev/sdc1
2 2 0 0 2 active sync
3 3 0 0 3 faulty removed
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 760291c6:73cd6884:c91d1289:ceb97d9c (local to host backup)
Creation Time : Wed Mar 4 17:13:19 2009
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Update Time : Wed Apr 10 23:52:35 2013
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 21461c - correct
Events : 112333

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 49 3 active sync /dev/sdd1

0 0 8 17 0 active sync
1 1 8 33 1 active sync /dev/sdc1
2 2 8 65 2 active sync /dev/sde1
3 3 8 49 3 active sync /dev/sdd1
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : 760291c6:73cd6884:c91d1289:ceb97d9c (local to host backup)
Creation Time : Wed Mar 4 17:13:19 2009
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Update Time : Wed Apr 10 23:52:35 2013
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 214643 - correct
Events : 112358

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 2 8 65 2 active sync /dev/sde1

0 0 8 17 0 active sync
1 1 8 33 1 active sync /dev/sdc1
2 2 8 65 2 active sync /dev/sde1
3 3 8 49 3 active sync /dev/sdd1

---------------------------------------------------- fdisk -lu /dev/sd[bcde] -----------------------------------------------
Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00025fce

Device Boot Start End Blocks Id System
/dev/sdb1 63 1953520064 976760001 fd Linux raid autodetect

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a177b

Device Boot Start End Blocks Id System
/dev/sdc1 63 1953520064 976760001 fd Linux raid autodetect

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
16 heads, 29 sectors/track, 4210183 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x9ab6c5c9

Device Boot Start End Blocks Id System
/dev/sdd1 2048 1953522049 976760001 fd Linux raid autodetect

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000291d5

Device Boot Start End Blocks Id System
/dev/sde1 63 1953520064 976760001 fd Linux raid autodetect



--
Pierre MARTINEAU

Institut de Recherche en Cancérologie de Montpellier
Inserm U896 – Université Montpellier 1 – CRLC Val d’Aurelle
Campus Val d’Aurelle
208 Rue des Apothicaires
F-34298 Montpellier Cedex 5, France

Tel: +33 (0)4 67 61 37 43
Fax: +33 (0)4 67 61 37 87
E-mail:pierre.martineau@xxxxxxxxx
E-mail:pierre.martineau@xxxxxxxxxxxxxxxxxxxxxxxx
Site internet:http://www.ircm.fr

begin:vcard
fn:Pierre MARTINEAU
n:MARTINEAU;Pierre
org:INSERM U896;IRCM
adr:208 rue des Apothicaires;;CRLC Val d'Aurelle-Paul Lamarque;Montpellier;;34298;France
email;internet:pierre.martineau@xxxxxxxxx
tel;work:+33 (0)4 67 61 37 43
tel;fax:+33 (0)4 67 61 37 87
x-mozilla-html:FALSE
url:http://www.ircm.fr
version:2.1
end:vcard


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux