Re: degraded raid 6 (1 bad drive) showing up inactive, only spares

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 07 Jun 2012 23:16:34 +0200 Oliver Schinagl <oliver+list@xxxxxxxxxxx>
wrote:

> Since i'm still working on repairing my own array, and using a wrong 
> version of mdadm corrupted one of my raid10 array, I'm trying to hexedit 
> the start of an image of the disk to recover the metadata.
> 
> A quick question, if I've edited/checked the first superblock,
> (i'm using 
> https://raid.wiki.kernel.org/index.php/RAID_superblock_formats for 
> reference and looks quite accurate)
> 
> Would I need to check other area's on the disk for superblocks? Or will 
> the first superblock be enough?

Are we talking about filesystem superblocks or RAID superblocks?

there is only one RAID superblock - normally 4K from the start (with 1.2
metadta).  There may be lots of filesystem superblocks.  I think extX only
uses the first if it is good, but I don't know for certain.

NeilBrown


> 
> On 07-06-12 14:29, NeilBrown wrote:
> > On Thu, 7 Jun 2012 13:55:32 +0200 Martin Ziler<martin.ziler@xxxxxxxxxxxxxx>
> > wrote:
> >
> >> Hello everybody,
> >>
> >> I am running a 9-disk raid6 without hot spares. I already had one drive go bad, which I could replace and continue using the array without any degraded raid messages. Recently I had another drive going bad by the smart-info. As it wasn't quite dead I left the array as was without really using it all that much waiting for a replacement drive I ordered. As I booted the machine up in order to replace the drive I was greeted by an inactive array with all devices showing up as spares.
> >>
> >> md0 : inactive sdh2[0](S) sdi2[7](S) sde2[6](S) sdd2[5](S) sdf2[1](S) sdg2[2](S) sdc1[9](S) sdb2[3](S)
> >>        15579088439 blocks super 1.2
> >>
> >> mdadm --examine confirms that. I already searched the web quite a bit and found this mailing list. Maybe someone in here can give me some input. Normally a degraded raid should still be active. So I am quite surprised that my array with only one drive missing goes inactive. I appended the info mdadm --examine puts out for all the drives. However the first two should probably suffice as only /dev/sdk differs from the rest. The faulty drive - sdk - is still recognized as a raid6 member, wheres all the others show up as spares. With lots of bad sectors sdk isn't accessible anymore.
> > You must be running 3.2.1 or 3.3 (I think).
> >
> > You've been bitten by a rather nasty bug.
> >
> > You can get your data back, but it will require a bit of care, so don't rush
> > it.
> >
> > The metadata on almost all the devices have been seriously corrupted.  The
> > only way to repair it is to recreate the array.
> > Doing this just writes new metadata and assembles the array.  It doesn't touch
> > the data so if we get the --create command right, all your data will be
> > available again.
> > If we get it wrong, you won't be able to see your data, but we can easily stop
> > the array and create again with different parameters until we get it right.
> >
> > First thing to do it to get a newer kernel.  I would recommend the latest in
> > the 3.3.y series.
> >
> > Then you need to:
> >   - make sure you have a version of mdadm which gets the data offset to 1M
> >     (2048 sectors).  I think 3.2.3 or earlier does that - don't upgrade to
> >     3.2.5.
> >   - find the chunk size - looks like it is 4M, as sdk2 isn't corrupt.
> >   - find the order of devices.  This should be in your kernel logs in
> >      "RAID conf printout".  Hopefully device names haven't changed.
> >
> >   Then (with new kernel running)
> >
> >    mdadm --create /dev/md0 -l6 -n9 -c 4M -e 1.2 /dev/sdb2 /dev/sdc2 /dev/sdd2 \
> >       /dev/sde2 /dev/sdf2 /dev/sdg2 /dev/sdh2 /dev/sdi2 missing \
> >       --assume-clean
> >
> >   Make double-sure you add that --assume-clean.
> >
> >   Note the last device is 'missing'. That corresponds to sdk2 (which we
> >   know is device 8 - the last of 9 (0..8)).  It fails so it not part of the
> >   array any more.  The others I just guessed the order.  You should try to
> >   verify it before you proceed (see RAID conf printout in kernel logs).
> >
> >   After the 'create' use "mdadm -E" to look at one device and make sure
> >   the Data Offset, Avail Dev Size and Array Size are the same as we saw
> >   on sdk2.
> >   If it is, try "fsck -n /dev/md0". That assumes ext3 or ext4.  If you had
> >   something else on the array some other command might be needed.
> >
> >   If that looks bad, "mdadm -S /dev/md0" and try again with a different order.
> >   If it looks good, "echo check>  /sys/block/md0/md/sync_action" and watch
> >    "mismatch_cnt" in the same directory.  If it says low (few hundred at most)
> >   all is good.  If it goes up to thousands something is wrong - try another
> >   order.
> >
> >   Once you have the array working again,
> >      "echo repair>  /sys/block/md0/md/sync_action"
> >   then add your new device to be rebuilt.
> >
> > Good luck.
> > Please ask if you are unsure about anything.
> >
> > NeilBrown
> >
> >>
> >> /dev/sdk2:
> >>            Magic : a92b4efc
> >>          Version : 1.2
> >>      Feature Map : 0x0
> >>       Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >>             Name : server:0  (local to host server)
> >>    Creation Time : Mon Jul 25 23:40:50 2011
> >>       Raid Level : raid6
> >>     Raid Devices : 9
> >>
> >>   Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >>       Array Size : 27172970496 (12957.08 GiB 13912.56 GB)
> >>    Used Dev Size : 3881852928 (1851.01 GiB 1987.51 GB)
> >>      Data Offset : 2048 sectors
> >>     Super Offset : 8 sectors
> >>            State : clean
> >>      Device UUID : 882eb11a:33b499a7:dd5856b7:165f916c
> >>
> >>      Update Time : Fri Jun  1 20:26:45 2012
> >>         Checksum : b8c58093 - correct
> >>           Events : 623119
> >>
> >>           Layout : left-symmetric
> >>       Chunk Size : 4096K
> >>
> >>     Device Role : Active device 8
> >>     Array State : AAAAAAAAA ('A' == active, '.' == missing)
> >>
> >>
> >> /dev/sdh2:
> >>            Magic : a92b4efc
> >>          Version : 1.2
> >>      Feature Map : 0x0
> >>       Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >>             Name : server:0  (local to host server)
> >>    Creation Time : Mon Jul 25 23:40:50 2011
> >>       Raid Level : -unknown-
> >>     Raid Devices : 0
> >>
> >>   Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >>      Data Offset : 2048 sectors
> >>     Super Offset : 8 sectors
> >>            State : active
> >>      Device UUID : 44008309:1dfb1408:cabfbd0a:64de3739
> >>
> >>      Update Time : Thu Jun  7 12:27:52 2012
> >>         Checksum : 27f93899 - correct
> >>           Events : 2
> >>
> >>     Device Role : spare
> >>     Array State :  ('A' == active, '.' == missing)
> >>
> >> ---------------------------------------------------------------------------------------------------------------
> >>
> >> /dev/sdi2:
> >>            Magic : a92b4efc
> >>          Version : 1.2
> >>      Feature Map : 0x0
> >>       Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >>             Name : server:0  (local to host server)
> >>    Creation Time : Mon Jul 25 23:40:50 2011
> >>       Raid Level : -unknown-
> >>     Raid Devices : 0
> >>
> >>   Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >>      Data Offset : 2048 sectors
> >>     Super Offset : 8 sectors
> >>            State : active
> >>      Device UUID : 135f196d:184f11a1:09207617:4022e1a5
> >>
> >>      Update Time : Thu Jun  7 12:27:52 2012
> >>         Checksum : 9ded8f86 - correct
> >>           Events : 2
> >>
> >>
> >>     Device Role : spare
> >>     Array State :  ('A' == active, '.' == missing)
> >>
> >> /dev/sde2:
> >>            Magic : a92b4efc
> >>          Version : 1.2
> >>      Feature Map : 0x0
> >>       Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >>             Name : server:0  (local to host server)
> >>    Creation Time : Mon Jul 25 23:40:50 2011
> >>       Raid Level : -unknown-
> >>     Raid Devices : 0
> >>
> >>   Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >>      Data Offset : 2048 sectors
> >>     Super Offset : 8 sectors
> >>            State : active
> >>      Device UUID : 3517bcc4:2acb381f:f5006058:5bd5c831
> >>
> >>      Update Time : Thu Jun  7 12:27:52 2012
> >>         Checksum : 408957c0 - correct
> >>           Events : 2
> >>
> >>
> >>     Device Role : spare
> >>     Array State :  ('A' == active, '.' == missing)
> >>
> >> /dev/sdd2:
> >>            Magic : a92b4efc
> >>          Version : 1.2
> >>      Feature Map : 0x0
> >>       Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >>             Name : server:0  (local to host server)
> >>    Creation Time : Mon Jul 25 23:40:50 2011
> >>       Raid Level : -unknown-
> >>     Raid Devices : 0
> >>
> >>   Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >>      Data Offset : 2048 sectors
> >>     Super Offset : 8 sectors
> >>            State : active
> >>      Device UUID : 9e8b2d2c:844a009a:fd6914a2:390f10ac
> >>
> >>      Update Time : Thu Jun  7 12:27:52 2012
> >>         Checksum : e6bdee68 - correct
> >>           Events : 2
> >>
> >>
> >>     Device Role : spare
> >>     Array State :  ('A' == active, '.' == missing)
> >>
> >> /dev/sdf2:
> >>            Magic : a92b4efc
> >>          Version : 1.2
> >>      Feature Map : 0x0
> >>       Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >>             Name : server:0  (local to host server)
> >>    Creation Time : Mon Jul 25 23:40:50 2011
> >>       Raid Level : -unknown-
> >>     Raid Devices : 0
> >>
> >>   Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >>      Data Offset : 2048 sectors
> >>     Super Offset : 8 sectors
> >>            State : active
> >>      Device UUID : 87ad38ac:4ccbd831:ee5502cd:28dafaad
> >>
> >>      Update Time : Thu Jun  7 12:27:52 2012
> >>         Checksum : 2b7a47f6 - correct
> >>           Events : 2
> >>
> >>
> >>     Device Role : spare
> >>     Array State :  ('A' == active, '.' == missing)
> >>
> >> /dev/sdg2:
> >>            Magic : a92b4efc
> >>          Version : 1.2
> >>      Feature Map : 0x0
> >>       Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >>             Name : server:0  (local to host server)
> >>    Creation Time : Mon Jul 25 23:40:50 2011
> >>       Raid Level : -unknown-
> >>     Raid Devices : 0
> >>
> >>   Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >>      Data Offset : 2048 sectors
> >>     Super Offset : 8 sectors
> >>            State : active
> >>      Device UUID : eef2f06f:28f881a5:da857a00:fb90e250
> >>
> >>      Update Time : Thu Jun  7 12:27:52 2012
> >>         Checksum : 393ba0f8 - correct
> >>           Events : 2
> >>
> >>
> >>     Device Role : spare
> >>     Array State :  ('A' == active, '.' == missing)
> >>
> >> /dev/sdc1:
> >>            Magic : a92b4efc
> >>          Version : 1.2
> >>      Feature Map : 0x0
> >>       Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >>             Name : server:0  (local to host server)
> >>    Creation Time : Mon Jul 25 23:40:50 2011
> >>       Raid Level : -unknown-
> >>     Raid Devices : 0
> >>
> >>   Avail Dev Size : 3985162143 (1900.27 GiB 2040.40 GB)
> >>    Used Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >>      Data Offset : 2048 sectors
> >>     Super Offset : 8 sectors
> >>            State : active
> >>      Device UUID : 4cf86fb0:6f334e2c:19e89c99:0532f557
> >>
> >>      Update Time : Thu Jun  7 12:27:52 2012
> >>         Checksum : a6e42bdc - correct
> >>           Events : 2
> >>
> >>
> >>     Device Role : spare
> >>     Array State :  ('A' == active, '.' == missing)
> >>
> >> /dev/sdb2:
> >>            Magic : a92b4efc
> >>          Version : 1.2
> >>      Feature Map : 0x0
> >>       Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >>             Name : server:0  (local to host server)
> >>    Creation Time : Mon Jul 25 23:40:50 2011
> >>       Raid Level : -unknown-
> >>     Raid Devices : 0
> >>
> >>   Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >>      Data Offset : 2048 sectors
> >>     Super Offset : 8 sectors
> >>            State : active
> >>      Device UUID : 4852882a:b8a3989f:aad747c5:25f20d47
> >>
> >>      Update Time : Thu Jun  7 12:27:52 2012
> >>         Checksum : a8e25edd - correct
> >>           Events : 2
> >>
> >>
> >>     Device Role : spare
> >>     Array State :  ('A' == active, '.' == missing)--
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux