Re: SRaid with 13 Disks crashed

Phil Turmel <philip@xxxxxxxxxx> · Fri, 10 Jun 2011 10:01:46 -0400

On 06/10/2011 09:06 AM, Dragon wrote:
> You are right, the array starts at pos 0 and so pos 1 and 7 are the right pos. the 2. try was perfect. fsck shows this:

Yay!

> fsck -n /dev/md0
> fsck from util-linux-ng 2.17.2
> e2fsck 1.41.12 (17-May-2010)
> /dev/md0 wurde nicht ordnungsgemÃÃ ausgehÃngt, PrÃfung erzwungen.
> Durchgang 1: PrÃfe Inodes, Blocks, und GrÃÃen
> Durchgang 2: PrÃfe Verzeichnis Struktur
> Durchgang 3: PrÃfe Verzeichnis VerknÃpfungen
> Durchgang 4: ÃberprÃfe die ReferenzzÃhler
> Durchgang 5: ÃberprÃfe Gruppe Zusammenfassung
> dd/dev/md0: 266872/1007288320 Dateien (15.4% nicht zusammenhÃngend), 3769576927/4029130864 BlÃcke
> 
> and:
> mdadm --detail /dev/md0
> /dev/md0:
>         Version : 0.90
>   Creation Time : Fri Jun 10 14:19:24 2011
>      Raid Level : raid5
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>    Raid Devices : 13
>   Total Devices : 13
> Preferred Minor : 0
>     Persistence : Superblock is persistent
> 
>     Update Time : Fri Jun 10 14:19:24 2011
>           State : clean
>  Active Devices : 13
> Working Devices : 13
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>            UUID : 8c4d8438:42aa49f9:a6d866f6:b6ea6b93 (local to host nassrv01)
>          Events : 0.1
> 
>     Number   Major   Minor   RaidDevice State
>        0       8      160        0      active sync   /dev/sdk
>        1       8      208        1      active sync   /dev/sdn
>        2       8      176        2      active sync   /dev/sdl
>        3       8      192        3      active sync   /dev/sdm
>        4       8        0        4      active sync   /dev/sda
>        5       8       16        5      active sync   /dev/sdb
>        6       8       64        6      active sync   /dev/sde
>        7       8       48        7      active sync   /dev/sdd
>        8       8       80        8      active sync   /dev/sdf
>        9       8       96        9      active sync   /dev/sdg
>       10       8      112       10      active sync   /dev/sdh
>       11       8      128       11      active sync   /dev/sdi
>       12       8      144       12      active sync   /dev/sdj
> 
> normaly i use fsck.ext4 e.a. fsck.ext4dev. problem? what means 15,4% not related? the quote of lost data? after that i shrink like this:?

fsck automatically calls fsck.ext4 when it sees an ext4 filesystem.  15.4% Not contiguous == 15.4 fragmented.  No lost data.

Now that you have a good filesystem, mounting it and taking a backup would be a good idea.  Or at least retrieve any files that are very important to you.

> mdadm  /dev/md0 --fail /dev/sdj
> mdadm /dev/md0 --remove /dev/sdj

NO! You must use "mdadm --grow".  Yes, "--grow" also does "shrink".  Your fsck shows that the ext4 filesystem is still sized for the original 12-disk setup, so you don't have to shrink the filesystem.  You do have to shrink the raid:

Step 1a: Tell mdadm the final size you are aiming for.  MD will emulate this while you test that the new size works:
mdadm /dev/md0 --grow --array-size=16116523456k

(Please show "mdadm -D /dev/md0" at this point.)

Step 1b: Verify data integrity with another fsck -n

Step 2:  Tell mdadm to really reshape to the 12-disk raid5
mdadm /dev/md0 --grow -n 12 --backup-file=/reshape.bak

When the reshape/shrink is done, "mdadm -D /dev/md0" will report "Raid Devices : 12" and "Spare Devices : 1", and one of them, almost certainly /dev/sdj, will be marked "spare".

At this point, I recommend converting to raid6, consuming the spare.

mdadm /dev/md0 --grow -n 13 -l 6 --backup-file=/reshape.bak

It might be possible to go directly to this layout (in place of step 2 above).  It would save a lot of time.  Maybe someone else on the list can answer that.  Or you can just try it.  I'm sure mdadm will complain if it's not possible ;).

> mdadm --detail --scan >> /etc/mdadm/mdadm.conf

Yes.  Make sure you edit it afterwards to remove the old array's information.

> right way? i assume that the disk that i take off the raid is not the same like i added at last? so i have to read out the serial to find it under the harddrives?

Yes, use lsdrv or "/s -l /dev/disk/by-id/" to make sure you remove the spare.  Of course, if you convert to raid6, it won't be a spare :).

> many thx so far

You are welcome.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html