Accesing dm-crypt volume after failed resizing with mdadm/RAID1, dm-crypt/LUKS, LVM

Paul Menzel <pm.debian@xxxxxxxxxxxxxx> · Thu, 4 Aug 2011 03:47:54 +0200

Dear dm-crypt folks,

as you might guess I am another lost guy turning to you as the last
resort to rescue his data.

I am sorry for the long text, but I am trying to be as elaborate as
possible so that my actions are kind of reproducible.

I have a RAID1 (mirroring) setup where only one drive is assembled
though. It is setup `/dev/md0` ← `/dev/sda1` and `/dev/md1` ←
`/dev/sda2`. `/dev/md1` is encrypted with LUKS and contains a LVM
setup with the logical volumes used by `/home/` and `/root/`.

A month ago the 500 GB drive was replaced by a 2 TB drive and I copied
the whole data with `dd_rescue` without any errors from the old to the
new drive. As we all know as a consequence only the old size of 500 GB
is usable and the partitions have to be resized/grown to be able to
use the whole 2 TB. But to emphasize it again, I still do have the old
drive available.

I wanted to resize the partitions today. Therefore I followed the
guide from Uwe Hermann [1] which I had done also some years ago where
it had worked without any problem.

So I booted from a USB medium with Grml 5.2011 (cryptsetup 1.3.0) and
followed the steps from the guide. Please note that Grml by default
does not assemble any RAIDs, that means `mdadm` was not run. (And
having only one drive I did not think about that the RAID1 might have
been taken care for too.)

1. `fdisk /dev/sda`
2. Remove second partition.
3. Create new partition with the same starting sector (automatically
63 was chosen, since there are only two partitions and it was the
second).
4. Choose proposed end sector, which was the maximum.
5. Choose type `autoraid detection`.
6. Saved it using w.

Afterward I did `cryptsetup luksOpen /dev/sda2 foo`, `pvresize
/dev/mapper/foo`, `service lvm2 start` and `lvresize -L +300GB
/dev/speicher/home` and `lvresize -L +20GB /dev/speicher/other`. Then
I ran `fsck -f /dev/speicher/home` and `xfs_check /mnt/other_mounted`
and there were no errors at all. Doing `resize2fs /dev/speicher/home`
and `xfs_resize /mnt/other_mounted`(?) I rebooted just to be surprised
that I was not asked for the LUKS password when booting into Debian. I
only saw `evms_activate is not available`.

Then I booted with Grml again to recreated the initrd.img,
`update-initramfs -u` thinking it needed to be updated too. I was
happy to see that I could still access `/dev/sda2` just fine using
`cryptsetup luksOpen /dev/sda2 sda2_crypt` and to mount everything in
it – `service lvm2 start` all volumes – for using `chroot` [3] and
rebuild the initrd image. But updating the initrd image was to no
avail although the `evms_activate is not available message`
disappeared.

Here I probably also have to mention that I have `mdadm` on hold on
the Debian system for quite some time because of some problems and I
did not dare to touch it.

Anyway I found out that the system was not able to assemble `/dev/md1`
from `/dev/sda2`. This did also not work under Grml and `mdadm` could
not find the md superblock on `/dev/sda2`.

       # mdadm --examine /dev/sda2
       mdadm: No md superblock detected on /dev/sda2.
       # blkid
       /dev/sda1: UUID="fb7f3dc5-d183-cab6-1212-31201a2207b9"
TYPE="linux_raid_member"
       /dev/sda2: UUID="cb6681b4-4dda-4548-8c59-3d9838de9c22"
TYPE="crypto_LUKS" # different UUID than before and “wrong” type
       # cryptsetup luksOpen /dev/sda2 sda2_crypt # still worked

On `#debian` somebody told me, that the md superblock is stored on the
end of the partition and that it was probably overwritten when
enlarging the partition and I should have growm the RAID too.

Searching the Internet for help I found several suggestions and I
tried to recreate the RAID with the following command.

       # mdadm --create /dev/md1 --assume-clean
--uuid=52ff2cf2-4098-1859-e58d-8dd65faec42c /dev/sda2 missing

I got a warning that there is metadata at the beginning and that I
should not go on when this is used for `/boot` and use
`--metadata=0.90`. But since it was not used for `/boot` I chose to go
on. Then the RAID was created but `cryptsetup luksOpen /dev/md1
md1_crypt` said that it was no LUKS device. Therefore I stopped the
RAID and `cryptsetup luksOpen /dev/sda2 sda2_crypt` still worked.

Then I was told on IRC that when only having one drive in a RAID1 it
does not matter if you alter `/dev/sda2` or `/dev/md1` and that I
should try to create the RAID again. Remembering that before the
resizing the RAID metadata (also on /dev/sda1) was `0.90` I passed
`--metadata=0.90` to the `mdadm --create` command.

       # mdadm --create /dev/md1 --assume-clean
--uuid=52ff2cf2-4098-1859-e58d-8dd65faec42c /dev/sda2 missing

I got an error message that the device is already part of a RAID and I
ignored it and went on. I first was happy because

       # cryptsetup luksOpen /dev/md1 md1_crypt

worked and asked me for the passphrase. But I typed the correct
passphrase several times and it was rejected. Then I probably forgot
to stop the RAID and

       # cryptsetup luksOpen /dev/sda2 sda2_crypt

showed the same behavior but that was probably typos and it seemed to
work once. But I got an error message which is the following making me
realize the RAID was probably still running and I stopped it right
away.

       Aug  4 00:16:01 grml kernel: [ 7964.786362] device-mapper:
table: 253:0: crypt: Device lookup failed
       Aug  4 00:16:01 grml kernel: [ 7964.786367] device-mapper:
ioctl: error adding target to table
       Aug  4 00:16:01 grml udevd[2409]: inotify_add_watch(6,
/dev/dm-0, 10) failed: No such file or directory
       Aug  4 00:16:01 grml udevd[2409]: inotify_add_watch(6,
/dev/dm-0, 10) failed: No such file or directory

       Aug  4 00:17:14 grml kernel: [ 8038.196371] md1: detected
capacity change from 1999886286848 to 0
       Aug  4 00:17:14 grml kernel: [ 8038.196395] md: md1 stopped.
       Aug  4 00:17:14 grml kernel: [ 8038.196407] md: unbind<sda2>
       Aug  4 00:17:14 grml kernel: [ 8038.212653] md: export_rdev(sda2)

After that `cryptsetup luksOpen /dev/sda2 sda2_crypt` always failed.

Now wanting to be smart I saved the LUKS header

       # cryptsetup luksHeaderBackup /dev/sda2 --header-backup-file
/home/grml/20110804--031--luksHeaderBackup

shut the system down, connected the old drive, booted Grml, saved the
LUKS header from `/dev/sda2` from the 500 GB drive, switched the
drives again and restoring the old header from before the resizing to
the new drive.

       # cryptsetup luksHeaderRestore /dev/sda2 --header-backup-file
/home/grml/20110804--031--luksHeaderBackup

Only to find out that this did also not help. I have some system
information from the late recovery attemps and still the old 500 GB
drive. Is there any way to recover the data?

The current situation is, that `luksOpen` does not succeed on
`/dev/md1` or `/dev/sda2`, that means it is detected as a LUKS device
but the passphrase is not accepted (I even typed it clear to the
console and copied it into the prompt).

### New drive ###

       # mdadm --examine /dev/sda2
       /dev/sda2:
                 Magic : a92b4efc
               Version : 0.90.00
                  UUID : 52ff2cf2:40981859:d8b78f65:99226e41 (local to
host grml)
         Creation Time : Thu Aug  4 00:05:57 2011
            Raid Level : raid1
         Used Dev Size : 1953013952 (1862.54 GiB 1999.89 GB)
            Array Size : 1953013952 (1862.54 GiB 1999.89 GB)
          Raid Devices : 2
         Total Devices : 2
       Preferred Minor : 1

           Update Time : Thu Aug  4 00:05:57 2011
                 State : clean
        Active Devices : 1
       Working Devices : 1
        Failed Devices : 1
         Spare Devices : 0
              Checksum : bf78bfbf - correct
                Events : 1

             Number   Major   Minor   RaidDevice State
       this     0       8        2        0      active sync   /dev/sda2

          0     0       8        2        0      active sync   /dev/sda2
          1     0       0        0        0      spare
       # blkid
       /dev/sda1: UUID="fb7f3dc5-d183-cab6-1212-31201a2207b9"
TYPE="linux_raid_member"
       /dev/sda2: UUID="52ff2cf2-4098-1859-d8b7-8f6599226e41"
TYPE="linux_raid_member"

### Old drive ###
       /dev/sda2:
                 Magic : a92b4efc
               Version : 0.90.00
                  UUID : 52ff2cf2:40981859:e58d8dd6:5faec42c
         Creation Time : Wed Mar 26 11:50:04 2008
            Raid Level : raid1
         Used Dev Size : 487885952 (465.28 GiB 499.60 GB)
            Array Size : 487885952 (465.28 GiB 499.60 GB)
          Raid Devices : 2
         Total Devices : 1
       Preferred Minor : 1

           Update Time : Sat Jun 18 14:25:10 2011
                 State : clean
        Active Devices : 1
       Working Devices : 1
        Failed Devices : 1
         Spare Devices : 0
              Checksum : 380692fc - correct
                Events : 25570832

             Number   Major   Minor   RaidDevice State
       this     0       8        2        0      active sync   /dev/sda2

          0     0       8        2        0      active sync   /dev/sda2
          1     1       0        0        1      faulty removed

Please tell me what other information you need.

Thanks in advance,

Paul

PS: Please excuse this long message and probably mistakes in it. It is
almost four in the morning and after 10 hours debugging I am quite
lost.
PPS: Not having access to the hard drive, I have to use the Web
interface for composing this message. I am sorry for any formatting
issues.

[1] http://www.hermann-uwe.de/blog/resizing-a-dm-crypt-lvm-ext3-partition
[2] http://grml.org/
[3] http://wiki.debian.org/DebianInstaller/Rescue/Crypto
_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
http://www.saout.de/mailman/listinfo/dm-crypt