Re: A few mdadm questions

Robert Osiel <bob@xxxxxxxxx> · Sun, 14 Nov 2004 10:12:39 -0600

Guy,

Thanks for the input.  I'm not sure why that disk is now a spare 
either.  I was hoping that there was some way to re-write that 
superblock to convince the array it was a good disk.  I saw some old 
(pre-mdadm) advice which mentioned using mkraid to rewrite (all) of the 
superblocks, but that seems really drastic.

In the worst case, as you mentioned, I would try to start with the other 
(failed) disk.  Most of the data on that drive is fairly static, so I 
hope to have some good recovery -- assuming the disk is still OK (in the 
past it has been something like a loose cable, so I'm hopeful).

I'll wait and see if Neil has any advice. *crosses fingers*

Bob

Guy wrote:

Your array had 5 disks, not counting any spares.
You need to start the array with at least 4 of the five disks, spares don't
help when starting an array.

I don't know why it thinks your disk (hdi1) is a spare.  But, that may
explain how it was removed from the array.  Unless Neil has some magic
incantations, I think you are out of luck.

If Neil has no ideas, you could try to start the array with the drive that
failed (hdk1), but that will cause corruption of any stripes that have
changed since the drive was removed from the array.  So, save this option as
a last resort.  Of course, if hdk1 has failed hard, you will not be able to
use it.

Last resort!!!  Corruption will occur!
mdadm --assemble --force /dev/md0 /dev/hde1 /dev/hdk1 /dev/hdm1 /dev/hdo1

Guy

-----Original Message-----
From: linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Robert Osiel
Sent: Saturday, November 13, 2004 7:36 PM
To: linux-raid@xxxxxxxxxxxxxxx
Subject: Re: A few mdadm questions

Guy/Neil:

Thanks a lot for the help.

Sorry that I didn't include all of the info in my last message, but this 
box is off the network right now and doesn't even have a floppy or 
monitor, so I had to do a little work to get the info out.

I tried to start the array with the 3 good disks and the 1 spare, but I 
got an error to the effect that 3 good + 1 spare drives are not enough 
to start the array (see below)

> cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]  [multipath]
read_ahead not set
unused devices: <none>

> mdadm -D /dev/md0
mdadm: md device /dev/md0 does not appear to be active

> mdadm --assemble --force /dev/md0 /dev/hde1 /dev/hdi1 /dev/hdm1 /dev/hdo1

mdadm: /dev/md0 assembled from 3 drives and 1 spare - not enough to 
start the array

> cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]  [multipath]
read_ahead not set
md0: inactive
ide/host2/bus0/target0/lun0/part1[0]
ide/host4/bus0/target0/lun0/part1[5]
ide/host6/bus1/target0/lun0/part1[4]
ide/host6/bus0/target0/lun0/part1[3]

Some notes:
hdk1 is the disk which failed initially
hdi1 is the disk which I removed and which thinks it is a 'spare'

The other three drives report basically identical info, like this:
> mdadm -E /dev/hde1

Magic : a92b4efc
Version : 00.90.00
UUID : ec2e64a8:fffd3e41:ffee5518:2f3e858c
Creation Time : Sun Oct 5 01:25:49 2003
Build Level: raid5
Device Size : 160079488 (152.66 GiB 163.92 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time Sat Sep 25 22:07:26 2004
State : dirty
Active Devices : 3
Working Devices : 4
Failed Devices : 1
Spare Devices : 1
Checksum : 4ee5cc77 - correct
Events : 0.10

Layout : left-symmetric
Chunk Size :  128K

   Number        Major    Minor    RaidDevice    State

this    0        22        1        0        active sync

0        0        22        1        0        active sync

1        1        0        0        1        faulty removed

2        2        56        1        2        faulty 
/dev/ide/host4/bus0/target0/lun0/part1

3        3        57        1        3        active sync    
/dev/ide/host4/bus1/target0/lun0/part1

4        4        88        1        4        active sync 
/dev/ide/host6/bus0/target0/lun0/part1

5        5        34        1        5        spare

Here are the two drives in question:

__________mdadm -E /dev/hdi1:

Magic : a92b4efc
Version : 00.90.00
UUID : ec2e64a8:fffd3e41:ffee5518:2f3e858c
Creation Time : Sun Oct 5 01:25:49 2003
Build Level: raid5
Device Size : 160079488 (152.66 GiB 163.92 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time Sat Sep 25 22:07:26 2004
State : dirty
Active Devices : 3
Working Devices : 4
Failed Devices : 1
Spare Devices : 1
Checksum : 4ee5cc77 - correct
Events : 0.10

Layout : left-symmetric
Chunk Size :  128K

   Number        Major    Minor    RaidDevice    State

this    5        34        1        5        spare

0        0        22        1        0        active sync

1        1        0        0        1        faulty removed

2        2        56        1        2        faulty 
/dev/ide/host4/bus0/target0/lun0/part1

3        3        57        1        3        active sync    
/dev/ide/host4/bus1/target0/lun0/part1

4        4        88        1        4        active sync 
/dev/ide/host6/bus0/target0/lun0/part1

5        5        34        1        5        spare

__________mdadm -E /dev/hdk1
Magic : a92b4efc
Version : 00.90.00
UUID : ec2e64a8:fffd3e41:ffee5518:2f3e858c
Creation Time : Sun Oct 5 01:25:49 2003
Build Level: raid5
Device Size : 160079488 (152.66 GiB 163.92 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time Sat Sep 25 22:07:24 2004
State : dirty
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : 4ee5cc77 - correct
Events : 0.9

Layout : left-symmetric
Chunk Size :  128K

   Number        Major    Minor    RaidDevice    State

this    2        56        1        2        active sync 
/dev/ide/host4/bus0/target0/lun0/part1

0        0        22        1        0        active sync

1        1        0        0        1        faulty removed

2        2        56        1        2        active sync 
/dev/ide/host4/bus0/target0/lun0/part1

3        3        57        1        3        active sync    
/dev/ide/host4/bus1/target0/lun0/part1

4        4        88        1        4        active sync 
/dev/ide/host6/bus0/target0/lun0/part1

5        5        34        1        5        spare

Neil Brown wrote:

On Friday November 12, bugzilla@xxxxxxxxxxxxxxxx wrote:

First, stop using the old raid tools.  Use mdadm only!  mdadm would not

have

allowed your error to occur.

I'm afraid this isn't correct, though the rest of Guy's advice is very
good (thanks Guy!).

mdadm --remove
does exactly the same thing as
raidhotremove

It is the kernel that should (and does) stop you from hot-removing a
device that is working and active.  So I'm not quite sure what
happened to Robert...

Robert: it is always useful to provide specific with the output of 
  cat /proc/mdstat

and

  mdadm -D /dev/mdX

This avoids possible confusion over terminology.

NeilBrown

-

To unsubscribe from this list: send the line "unsubscribe linux-raid" in

the body of a message to majordomo@xxxxxxxxxxxxxxx

More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html