LUKS superblock damaged by `mdadm --create` or user error?

Paul Menzel <pm.debian@xxxxxxxxxxxxxx> · Thu, 4 Aug 2011 21:27:30 +0200

Dear Linux RAID folks,

I hope I did not annoy you too much on #linux-raid and I am contacting
this list to reach a broader audience for help and for archival
purposes. My message to the list dm-crypt [1] was a little long and so
is this one. I am sorry.

After having grown `/dev/sda2` using `fdisk /dev/sda2` with mdadm not
running I forgot to grow the RAID1 and probably overwrote the md
metadata (0.90) or made it unavaible because it was not at the end of
the partition anymore after growing the physical and logical LVM
volumes and filesystems.

    # blkid
    /dev/sda1: UUID="fb7f3dc5-d183-cab6-1212-31201a2207b9"
TYPE="linux_raid_member"
    /dev/sda2: UUID="cb6681b4-4dda-4548-8c59-3d9838de9c22"
TYPE="crypto_LUKS" # In `fdisk` I had set it to »Linux raid
autodetect« (0xfd) though.

I could not boot anymore because `/dev/md1` could not be assembled.

        # mdadm --examine /dev/sda2
        mdadm: No md superblock detected on /dev/sda2.

        # mdadm --examine /dev/sda1
        /dev/sda1:
                  Magic : a92b4efc
                Version : 0.90.00
                   UUID : fb7f3dc5:d183cab6:12123120:1a2207b9
          Creation Time : Wed Mar 26 11:49:57 2008
             Raid Level : raid1
          Used Dev Size : 497856 (486.27 MiB 509.80 MB)
             Array Size : 497856 (486.27 MiB 509.80 MB)
           Raid Devices : 2
          Total Devices : 1
        Preferred Minor : 0

            Update Time : Wed Aug  3 21:11:43 2011
                  State : clean
         Active Devices : 1
        Working Devices : 1
         Failed Devices : 1
          Spare Devices : 0
               Checksum : 388e903a - correct
                 Events : 20332

              Number   Major   Minor   RaidDevice State
        this     0       8        1        0      active sync   /dev/sda1

           0     0       8        1        0      active sync   /dev/sda1
           1     1       0        0        1      faulty removed

        # mdadm --verbose --assemble /dev/md1
--uuid=52ff2cf2:40981859:e58d8dd6:5faec42c
        mdadm: looking for devices for /dev/md1
        mdadm: no recogniseable superblock on /dev/dm-8
        mdadm: /dev/dm-8 has wrong uuid.
        mdadm: no recogniseable superblock on /dev/dm-7
        mdadm: /dev/dm-7 has wrong uuid.
        mdadm: no recogniseable superblock on /dev/dm-6
        mdadm: /dev/dm-6 has wrong uuid.
        mdadm: no recogniseable superblock on /dev/dm-5
        mdadm: /dev/dm-5 has wrong uuid.
        mdadm: no recogniseable superblock on /dev/dm-4
        mdadm: /dev/dm-4 has wrong uuid.
        mdadm: no recogniseable superblock on /dev/dm-3
        mdadm: /dev/dm-3 has wrong uuid.
        mdadm: no recogniseable superblock on /dev/dm-2
        mdadm: /dev/dm-2 has wrong uuid.
        mdadm: no recogniseable superblock on /dev/dm-1
        mdadm: /dev/dm-1 has wrong uuid.
        mdadm: cannot open device /dev/dm-0: Device or resource busy
        mdadm: /dev/dm-0 has wrong uuid.
        mdadm: no recogniseable superblock on /dev/md0
        mdadm: /dev/md0 has wrong uuid.
        mdadm: cannot open device /dev/loop0: Device or resource busy
        mdadm: /dev/loop0 has wrong uuid.
        mdadm: cannot open device /dev/sdb4: Device or resource busy
        mdadm: /dev/sdb4 has wrong uuid.
        mdadm: cannot open device /dev/sdb: Device or resource busy
        mdadm: /dev/sdb has wrong uuid.
        mdadm: cannot open device /dev/sda2: Device or resource busy
        mdadm: /dev/sda2 has wrong uuid.
        mdadm: cannot open device /dev/sda1: Device or resource busy
        mdadm: /dev/sda1 has wrong uuid.
        mdadm: cannot open device /dev/sda: Device or resource busy
        mdadm: /dev/sda has wrong uuid.

and `mdadm --examine /dev/sda2` could not find any metadata.
`/dev/sda2` could still be decrypted using `cryptsetup luksOpen
/dev/sda2 sda2_crypt`. Not knowing about metadata and their storage
(0.90) I read several Web resourses and joined IRC channels and came
to the conclusion that I should just create a new (degraded) RAID1 and
everything would be fine, since I had only one disk.

Booting from the live system Grml [3], which does *not* start `mdadm`
or `lvm` during boot, I tried to create a new RAID1 using the
following command (a).

   # command (a)
   mdadm --verbose --create /dev/md1 \
   --assume-clean \
   --level=1 \
   --raid-devices=2 \
   --uuid=52ff2cf2:40981859:e58d8dd6:5faec42c \
   /dev/sda2 missing

I ignored the warning about overwriting metadata because it only
referred to booting. Unfortunately `cryptsetup luksOpen /dev/md1
md1_crypt` did not find any LUKS superblock. Therefore I stopped
`/dev/md1` and `cryptsetup luksOpen /dev/sda2 sda2_crypt` still
worked. Then I remembered that the metadata version was originally
0.90 and added `--metadata=0.90` and executed the following (b).

   # command (b)
   mdadm --verbose --create /dev/md1 \
   --assume-clean \
   --level=1 \
   --raid-devices=2 \
   --uuid=52ff2cf2:40981859:e58d8dd6:5faec42c \
   --metadata=0.90
   /dev/sda2 missing

Lucky me I thought, `cryptsetup luksOpen /dev/md1 md1_crypt` asked me
for the passphrase but I entered it three times and it would not
unlock. Instead of trying it again – I do not know if it would have
worked – I tried `cryptsetup luksOpen /dev/sda2 sda2_crypt` and it
asked me for the passphrase too. The third time I seem to have entered
it correctly, but I got an error message that it could not be mapped.

--- dmesg ---
Aug  4 00:16:01 grml kernel: [ 7964.786362] device-mapper:
table: 253:0: crypt: Device lookup failed
       Aug  4 00:16:01 grml kernel: [ 7964.786367] device-mapper:
ioctl: error adding target to table
       Aug  4 00:16:01 grml udevd[2409]: inotify_add_watch(6,
/dev/dm-0, 10) failed: No such file or directory
       Aug  4 00:16:01 grml udevd[2409]: inotify_add_watch(6,
/dev/dm-0, 10) failed: No such file or directory

       Aug  4 00:17:14 grml kernel: [ 8038.196371] md1: detected
capacity change from 1999886286848 to 0
       Aug  4 00:17:14 grml kernel: [ 8038.196395] md: md1 stopped.
       Aug  4 00:17:14 grml kernel: [ 8038.196407] md: unbind<sda2>
       Aug  4 00:17:14 grml kernel: [ 8038.212653] md: export_rdev(sda2)
--- dmesg ---

Then I realized that I had probably forgotten to stop `/dev/md1`.
After stopping it, `cryptsetup luksOpen /dev/sda2 sda2_crypt` did not
succeed anymore and I cannot access my data.

1. Does the `dmesg` output suggest that accessing `/dev/sda2` while
assembled caused any breakage?
2. On #lvm and #linux-raid the common explanation was that command (a)
had overwritten the LUKS superblock and damaged it. Is that possible?
I could not find the magic number 0xa92b4efc in the first megabyte of
`/dev/sda2`. Did `--assume-clean` prevent that?
3. Is command (b) to blame, or did it probably work and I had a typo
in the passphrase?

I am thankful for any hint to get my data back.

Thanks and sorry for the long message. Any hints on how to shorten it
next time are much appreciated.

Paul

PS: A month ago I head `dd` the content of a 500 GB drive to this one.
That is why I wanted to resize the partitions. The old drive is still
functional and I am attaching several outputs from commands from the
current 2 TB drive and the old drive. The `luksDump` output is from
the current drive but with the LUKS header from the 500 GB drive. I
know that I am publishing the key to access my drive, but if it helps
to get my data back I will encrypt from scratch again afterward. I
also have the dump of the first MB (in this case) of the partition
(`luksHeaderBackup`) from the old and new drive. But attaching them
would be over the message size limit.

[1] http://www.saout.de/pipermail/dm-crypt/2011-August/001857.html
[2] http://www.hermann-uwe.de/blog/resizing-a-dm-crypt-lvm-ext3-partition
[3] http://grml.org/
Accesing dm-crypt volume after failed resizing with mdadm/RAID1, dm-crypt/LUKS, LVM

Dear dm-crypt folks,

as you might guess I am another lost guy turning to you as the last resort to rescue his data.

I am sorry for the long text, but I am trying to be as elaborate as possible.

I have a RAID1 (mirroring) setup where only one drive is assembled though. It is setup `/dev/md0` ← `/dev/sda1` and `/dev/md1` ← `/dev/sda2`. `/dev/md1` is encrypted with LUKS and contains a LVM setup with the logical volumes used by `/home/` and `/root/`.

A month ago the 500 GB drive was replaced by a 2 TB drive and I copied the whole data with `dd_rescue` without any errors from the old to the new drive. As we all know as a consequence only the old size of 500 GB is usable and the partitions have to be resized/grown to be able to use the whole 2 TB. But to emphasize it again, I still do have the old drive available.

I wanted to resize the partitions today. Therefore I followed the guide from Uwe Hermann [1] which I had done also some years ago where it had worked without any problem.

So I booted from a USB medium with Grml 5.2011 (cryptsetup 1.3.0) and followed the steps from the guide. Please note that Grml by default does not assemble any RAIDs, that means `mdadm` was not run. (And having only one drive I did not think about that the RAID1 might have been taken care for too.)

1. `fdisk /dev/sda`
2. Remove second partition.
3. Create new partition with the same starting sector (automatically 63 was chosen, since there are only two partitions and it was the second).
4. Choose proposed end sector, which was the maximum.
5. Choose type `autoraid detection`.
6. Saved it using w.

Afterward I did `cryptsetup luksOpen /dev/sda2 foo`, `pvresize /dev/mapper/foo`, `service lvm2 start` and `lvresize -L +300GB /dev/speicher/home` and `lvresize -L +20GB /dev/speicher/other`. Then I ran `fsck -f /dev/speicher/home` and `xfs_check /mnt/other_mounted` and there were no errors at all. Doing `resize2fs /dev/speicher/home` and `xfs_resize /mnt/other_mounted`(?) I rebooted just to be surprised that I was not asked for the LUKS password when booting into Debian. I only saw `evms_activate is not available`.

Then I booted with Grml again to recreated the initrd.img, `update-initramfs -u` thinking it needed to be updated too. I was happy to see that I could still access `/dev/sda2` just fine using `cryptsetup luksOpen /dev/sda2 sda2_crypt` and to mount everything in it – `service lvm2 start` all volumes – for using `chroot` [3] and rebuild the initrd image. But updating the initrd image was to no avail although the `evms_activate is not available message` disappeared.

Here I probably also have to mention that I have `mdadm` on hold on the Debian system for quite some time because of some problems and I did not dare to touch it.

Anyway I found out that the system was not able to assemble `/dev/md1` from `/dev/sda2`. This did also not work under Grml and `mdadm` could not find the md superblock on `/dev/sda2`.

	# mdadm --examine /dev/sda2
	mdadm: No md superblock detected on /dev/sda2.
	# blkid
	/dev/sda1: UUID="fb7f3dc5-d183-cab6-1212-31201a2207b9" TYPE="linux_raid_member"
	/dev/sda2: UUID="cb6681b4-4dda-4548-8c59-3d9838de9c22" TYPE="crypto_LUKS" # different UUID than before and “wrong” type
	# cryptsetup luksOpen /dev/sda2 sda2_crypt # still worked

On `#debian` somebody told me, that the md superblock is stored on the end of the partition and that it was probably overwritten when enlarging the partition and I should have growm the RAID too.

Searching the Internet for help I found several suggestions and I tried to recreate the RAID with the following command.

	# mdadm --create /dev/md1 --assume-clean --uuid=52ff2cf2-4098-1859-e58d-8dd65faec42c /dev/sda2 missing

I got a warning that there is metadata at the beginning and that I should not go on when this is used for `/boot` and use `--metadata=0.90`. But since it was not used for `/boot` I chose to go on. Then the RAID was created but `cryptsetup luksOpen /dev/md1 md1_crypt` said that it was no LUKS device. Therefore I stopped the RAID and `cryptsetup luksOpen /dev/sda2 sda2_crypt` still worked.

Then I was told on IRC that when only having one drive in a RAID1 it does not matter if you alter `/dev/sda2` or `/dev/md1` and that I should try to create the RAID again. Remembering that before the resizing the RAID metadata (also on /dev/sda1) was `0.90` I passed `--metadata=0.90` to the `mdadm --create` command.

	# mdadm --create /dev/md1 --assume-clean --uuid=52ff2cf2-4098-1859-e58d-8dd65faec42c /dev/sda2 missing

I got an error message that the device is already part of a RAID and I ignored it and went on. I first was happy because

	# cryptsetup luksOpen /dev/md1 md1_crypt

worked and asked me for the passphrase. But I typed the correct passphrase several times and it was rejected. Then I probably forgot to stop the RAID and

	# cryptsetup luksOpen /dev/sda2 sda2_crypt

showed the same behavior but that was probably typos and it seemed to work once. But I got an error message which is the following making me realize the RAID was probably still running and I stopped it right away.

	Aug  4 00:16:01 grml kernel: [ 7964.786362] device-mapper: table: 253:0: crypt: Device lookup failed
	Aug  4 00:16:01 grml kernel: [ 7964.786367] device-mapper: ioctl: error adding target to table
	Aug  4 00:16:01 grml udevd[2409]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory
	Aug  4 00:16:01 grml udevd[2409]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory

	Aug  4 00:17:14 grml kernel: [ 8038.196371] md1: detected capacity change from 1999886286848 to 0
	Aug  4 00:17:14 grml kernel: [ 8038.196395] md: md1 stopped.
	Aug  4 00:17:14 grml kernel: [ 8038.196407] md: unbind<sda2>
	Aug  4 00:17:14 grml kernel: [ 8038.212653] md: export_rdev(sda2)

After that `cryptsetup luksOpen /dev/sda2 sda2_crypt` always failed.

Now wanting to be smart I saved the LUKS header

	# cryptsetup luksHeaderBackup /dev/sda2 --header-backup-file /home/grml/20110804--031--luksHeaderBackup

shut the system down, connected the old drive, booted Grml, saved the LUKS header from `/dev/sda2` from the 500 GB drive, switched the drives again and restoring the old header from before the resizing to the new drive.

	# cryptsetup luksHeaderRestore /dev/sda2 --header-backup-file /home/grml/20110804--031--luksHeaderBackup

Only to find out that this did also not help. I have some system information from the late recovery attemps and still the old 500 GB drive. Is there any way to recover the data?

The current situation is, that `luksOpen` does not succeed on `/dev/md1` or `/dev/sda2`, that means it is detected as a LUKS device but the passphrase is not accepted (I even typed it clear to the console and copied it into the prompt).

### New drive ###

	# mdadm --examine /dev/sda2
	/dev/sda2:
		  Magic : a92b4efc
		Version : 0.90.00
		   UUID : 52ff2cf2:40981859:d8b78f65:99226e41 (local to host grml)
	  Creation Time : Thu Aug  4 00:05:57 2011
	     Raid Level : raid1
	  Used Dev Size : 1953013952 (1862.54 GiB 1999.89 GB)
	     Array Size : 1953013952 (1862.54 GiB 1999.89 GB)
	   Raid Devices : 2
	  Total Devices : 2
	Preferred Minor : 1

	    Update Time : Thu Aug  4 00:05:57 2011
		  State : clean
	 Active Devices : 1
	Working Devices : 1
	 Failed Devices : 1
	  Spare Devices : 0
	       Checksum : bf78bfbf - correct
		 Events : 1

	      Number   Major   Minor   RaidDevice State
	this     0       8        2        0      active sync   /dev/sda2

	   0     0       8        2        0      active sync   /dev/sda2
	   1     0       0        0        0      spare
	# blkid 
	/dev/sda1: UUID="fb7f3dc5-d183-cab6-1212-31201a2207b9" TYPE="linux_raid_member" 
	/dev/sda2: UUID="52ff2cf2-4098-1859-d8b7-8f6599226e41" TYPE="linux_raid_member"

### Old drive ###
	/dev/sda2:
		  Magic : a92b4efc
		Version : 0.90.00
		   UUID : 52ff2cf2:40981859:e58d8dd6:5faec42c
	  Creation Time : Wed Mar 26 11:50:04 2008
	     Raid Level : raid1
	  Used Dev Size : 487885952 (465.28 GiB 499.60 GB)
	     Array Size : 487885952 (465.28 GiB 499.60 GB)
	   Raid Devices : 2
	  Total Devices : 1
	Preferred Minor : 1

	    Update Time : Sat Jun 18 14:25:10 2011
		  State : clean
	 Active Devices : 1
	Working Devices : 1
	 Failed Devices : 1
	  Spare Devices : 0
	       Checksum : 380692fc - correct
		 Events : 25570832

	      Number   Major   Minor   RaidDevice State
	this     0       8        2        0      active sync   /dev/sda2

	   0     0       8        2        0      active sync   /dev/sda2
	   1     1       0        0        1      faulty removed

Please tell me what other information you need.

Thanks in advance,

Paul

PS: Please excuse this long message and probably mistakes in it. It is almost four in the morning and after 10 hours debugging I am quite lost.

[1] http://www.hermann-uwe.de/blog/resizing-a-dm-crypt-lvm-ext3-partition
[2] http://grml.org/
[3] http://wiki.debian.org/DebianInstaller/Rescue/Crypto
Attachment:
20110804--new-drive--blkid

Description: Binary data
Attachment:
20110804--new-drive--fdisk-l-sda

Description: Binary data
Attachment:
20110804--new-drive--fdisk-s-sda1

Description: Binary data
Attachment:
20110804--new-drive--fdisk-s-sda2

Description: Binary data
Attachment:
20110804--new-drive--mdadm--examine-sda1

Description: Binary data
Attachment:
20110804--new-drive--mdadm--examine-sda2

Description: Binary data
Attachment:
20110804--new-drive--sfdisk-d-sda

Description: Binary data
Attachment:
20110804--new-drive--with-header-from-old-drive--cryptsetup-luksDump

Description: Binary data
Attachment:
20110804--old-drive--blkid

Description: Binary data
Attachment:
20110804--old-drive--fdisk-l

Description: Binary data
Attachment:
20110804--old-drive--fdisk-s-sda1

Description: Binary data
Attachment:
20110804--old-drive--fdisk-s-sda2

Description: Binary data
Attachment:
20110804--new-drive--sfdisk-d-sda

Description: Binary data
Attachment:
20110804--old-drive--mdadm--examine-sda1

Description: Binary data
Attachment:
20110804--old-drive--mdadm--examine-sda2

Description: Binary data
Attachment:
20110804--old-drive--sfdisk-d-sda

Description: Binary data