Re: Interrupted reshape -- mangled backup ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 17 Oct 2012 23:34:26 +0200 Haakon Alstadheim
<hakon.alstadheim@xxxxxxxxx> wrote:

> I have a Raid5 array with 4 devices that I wanted to see if I could get 
> a better performance out of, so i tried changing the chunk size from 64K 
> to something bigger. (famous last words) .  I got into some other 
> trouble and thought I needed a reboot. On reboot I several times managed 
> to mount and specify the device with my backup file during initramfs, 
> but the reshape stopped every time once the system was at initialized.

So worst-case you can do that again, but insert a "sleep 365d" immediately
after the "mdadm --assemble" is run, so the system never completely
initialises.  Then just wait for the reshape to finish.

When mdadm assembles and array that needs to keep growing it will for a
background process to continue monitoring the reshape process.  Presumably
that background process is getting killed.  I don't know why.

> 
> This is under debian sqeeze with a 3.2.0-0.bpo.3-686-pae kernel from 
> backports. I installed mdadm from backports to get the latest version of 
> that as well, and tried rebooting with --freeze-reshape. Suspect that I 
> mixed up my initrd.img-files and started without --freeze-reshape the 
> first time after installing the new mdadm. Now mdadm says it can not 
> find a backup in my backup file. Opening up the backup in emacs, it 
> seems to contain only NULs. Can't be right, can it? I have been mounting 
> the backup under a directory under /dev/, on the assumption that the 
> mount wold survive past the initramfs stage.

The backup file could certainly contain lots of nuls, but it shouldn't be
*all* nulls.  At least there should be a header at the start which describes
which area of the device is contained in the backup.

You can continue without a backup.  You still need to specify a backup file,
but if you add "--invalid-backup", it will continue even if the backup file
doesn't contain anything useful.
If the machine was shutdown by a crash during reshape you might suffer
corruption.  If it was a clean shutdown you won't.

--freeze-reshape is intended to be the way to handle this, with 
   --grow --continue
once you are fully up and running, but I don't think that works correctly for
'native' metadata yet - it was implemented with IMSM metadata in mind.

NeilBrown


> 
> My bumbling has been happening with a current, correct, 
> /etc/mdadm/mdadm.conf containigng:
> --------
> DEVICE /dev/sdh /dev/sde /dev/sdc /dev/sdd
> CREATE owner=root group=disk mode=0660 auto=yes
> HOMEHOST <system>
> ARRAY /dev/md1 level=raid5 num-devices=4 
> UUID=583001c4:650dcf0c:404aaa6f:7fc38959 spare-group=main
> -------
> The show-stopper happened with an initramfs and a script in 
> /scripts/local-top/mdadm along the lines of:
> -------
> /sbin/mdadm --assemble -f --backup-file=/dev/bak/md1-backup /dev/md1 
> --run --auto=yes /dev/sdh /dev/sde /dev/sdc /dev/sdd
> -------
> 
> At times I have also had to use the env-variable MDADM_GROW_ALLOW_OLD=1
> 
> Below is the output of mdadm -Evvvvs:
> --------
> 
> 
> /dev/sdh:
>            Magic : a92b4efc
>          Version : 0.91.00
>             UUID : 583001c4:650dcf0c:404aaa6f:7fc38959
>    Creation Time : Wed Dec  3 19:45:33 2008
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 1
> 
>    Reshape pos'n : 2368561152 (2258.84 GiB 2425.41 GB)
>    New Chunksize : 131072
> 
>      Update Time : Wed Oct 17 02:15:53 2012
>            State : active
>   Active Devices : 4
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : 14da0760 - correct
>           Events : 778795
> 
>           Layout : left-symmetric
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     0       8      112        0      active sync   /dev/sdh
> 
>     0     0       8      112        0      active sync   /dev/sdh
>     1     1       8       48        1      active sync   /dev/sdd
>     2     2       8       32        2      active sync   /dev/sdc
>     3     3       8       64        3      active sync   /dev/sde
> /dev/sde:
>            Magic : a92b4efc
>          Version : 0.91.00
>             UUID : 583001c4:650dcf0c:404aaa6f:7fc38959
>    Creation Time : Wed Dec  3 19:45:33 2008
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 1
> 
>    Reshape pos'n : 2368561152 (2258.84 GiB 2425.41 GB)
>    New Chunksize : 131072
> 
>      Update Time : Wed Oct 17 02:15:53 2012
>            State : active
>   Active Devices : 4
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : 14da0736 - correct
>           Events : 778795
> 
>           Layout : left-symmetric
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     3       8       64        3      active sync   /dev/sde
> 
>     0     0       8      112        0      active sync   /dev/sdh
>     1     1       8       48        1      active sync   /dev/sdd
>     2     2       8       32        2      active sync   /dev/sdc
>     3     3       8       64        3      active sync   /dev/sde
> /dev/sdc:
>            Magic : a92b4efc
>          Version : 0.91.00
>             UUID : 583001c4:650dcf0c:404aaa6f:7fc38959
>    Creation Time : Wed Dec  3 19:45:33 2008
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 1
> 
>    Reshape pos'n : 2368561152 (2258.84 GiB 2425.41 GB)
>    New Chunksize : 131072
> 
>      Update Time : Wed Oct 17 02:15:53 2012
>            State : active
>   Active Devices : 4
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : 14da0714 - correct
>           Events : 778795
> 
>           Layout : left-symmetric
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     2       8       32        2      active sync   /dev/sdc
> 
>     0     0       8      112        0      active sync   /dev/sdh
>     1     1       8       48        1      active sync   /dev/sdd
>     2     2       8       32        2      active sync   /dev/sdc
>     3     3       8       64        3      active sync   /dev/sde
> /dev/sdd:
>            Magic : a92b4efc
>          Version : 0.91.00
>             UUID : 583001c4:650dcf0c:404aaa6f:7fc38959
>    Creation Time : Wed Dec  3 19:45:33 2008
>       Raid Level : raid5
>    Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>       Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 1
> 
>    Reshape pos'n : 2368561152 (2258.84 GiB 2425.41 GB)
>    New Chunksize : 131072
> 
>      Update Time : Wed Oct 17 02:15:53 2012
>            State : active
>   Active Devices : 4
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : 14da0722 - correct
>           Events : 778795
> 
>           Layout : left-symmetric
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     1       8       48        1      active sync   /dev/sdd
> 
>     0     0       8      112        0      active sync   /dev/sdh
>     1     1       8       48        1      active sync   /dev/sdd
>     2     2       8       32        2      active sync   /dev/sdc
>     3     3       8       64        3      active sync   /dev/sde
> ---------------------------
> 
> I guess the moral of all this is that if you want to use mdadm you 
> should pay attention and not be in too much of a hurry :-/ .
> I'm just hoping that I can get my system back. This raid contains my 
> entire system, and will take a LOT of work to recreate. Mail, calendars 
> ... . Backups are a couple of weeks old ...
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux