XFS filesystem recovery from secondary superblocks

Aaron Goulding <aarongldng@xxxxxxxxx> · Tue, 30 Oct 2012 22:02:28 -0700

Hello! So I have an XFS filesystem that isn't mounting, and quite a long story as to why and what I've tried. 

And before you start, yes backups are the preferred method of restoration at this point. Never trust your files to a single FS, etc.

So I have a 9 disk MD array (0.9 superblock format, total usable space 14TB) configured as an LVM PV, one VG, then one LV with not quite all the space allocated. That LV is formatted XFS and mounted as /mnt/storage. This was started on Ubuntu 10.04 which has been release-upgraded to 12.04. The LV has been grown 3 times over the last two years. The system's boot, root and swap partitions are on a separate drive. 

So what happened? Well one drive died spectacularly. It had a full bearing failure which caused a power drain and the system kicked out two more drives instantly. This put the array into an offline state as expected. I replaced the failed drive with a new one, and checked carefully the disk order, before attempting to re-assemble the array. At the time, I didn't know about mdadm --re-add. (Likely my first mistake)

mdadm --create --assume-clean --level=6 --raid-devices=9 /dev/md0 /dev/sdg1 missing /dev/sdh1 /dev/sdj1 /dev/sdd1 /dev/sdb1 /dev/sde1 /dev/sdf1 /dev/sdc1

The first problem with this is that the update to Ubuntu meant it created the superblocks as 1.2 instead of 0.9. Not catching this, I then added in the replacement /dev/sdi1. This started the array rebuilding incorrectly. I quickly realized my mistake and stopped the array, then recreated again, this time using superblock 0.9 format, but the damage had already been done to roughly the first 100GB of the array, possibly more. 

I attempted to restore the lvm superblock from the backup stored in /etc/lvm/backup/

pvcreate -f -vv --uuid "hJrAn2-wTd8-vY11-steD-23Jh-AwKK-4VvnkH" --restorefile /etc/lvm/backup/vg1 /dev/md0

When that failed, I decided to attach a second array so I could more safely examine the problem. I built a second MD array with 7 3T disks in RAID6, giving me a 15TB /mnt/restore volume to work with. I made a dd copy of /dev/md0 to a test file I could manipulate safely.

Once I had the file created, I tried xfs_clean -f /mnt/restore/md0.dat to no luck. I used a hex editor to add XFSB to be beginning, hoping the recovery would just clean around the LVM data with similar results. The result looks like the following:

Phase 1 - find and verify superblock...
bad primary superblock - bad or unsupported version !!!

attempting to find secondary superblock...
....................................................................................................

unable to verify superblock, continuing...
....................................................................................................
unable to verify superblock, continuing...
....................................................................................................

Exiting now.

running xfs_db /mnt/restore/md0.dat would appear to run out of memory.

So I realized I needed to pull the data out of LVM and re-assemble it properly if I was going to make any progress. So I checked the backup config again:

# Generated by LVM2 version 2.02.66(2) (2010-05-20): Sun Jul 29 13:40:58 2012

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing 'vgcfgbackup'"

creation_host = "jarvis"        # Linux jarvis 3.0.0-23-server #39-Ubuntu SMP Thu Jul 19 19:37:41 UTC 2012 x86_64
creation_time = 1343594458      # Sun Jul 29 13:40:58 2012

vg1 {
        id = "hJrAn2-wTd8-vY11-steD-23Jh-AwKK-4VvnkH"

        seqno = 19
        status = ["RESIZEABLE", "READ", "WRITE"]
        flags = []
        extent_size = 8192              # 4 Megabytes
        max_lv = 0
        max_pv = 0

        physical_volumes {

                pv0 {
                        id = "VRHqH4-oIje-iQWV-iLUL-dLXX-eEf9-mLd9Z7"
                        device = "/dev/md0"     # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 27349166336  # 12.7354 Terabytes
                        pe_start = 768
                        pe_count = 3338521      # 12.7354 Terabytes

                }
        }

        logical_volumes {

                storage {
                        id = "H47IMn-ohEG-3W6l-NfCu-ePjJ-U255-FcIjdp"
                        status = ["READ", "WRITE", "VISIBLE"]

                        flags = []
                        segment_count = 4

                        segment1 {
                                start_extent = 0
                                extent_count = 2145769  # 8.18546 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv0", 25794

                                ]
                        }
                        segment2 {
                                start_extent = 2145769
                                extent_count = 626688   # 2.39062 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv0", 2174063

                                ]
                        }
                        segment3 {
                                start_extent = 2772457
                                extent_count = 384170   # 1.46549 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv0", 2954351

                                ]
                        }
                        segment4 {
                                start_extent = 3156627
                                extent_count = 140118   # 547.336 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv0", 2800751

                                ]
                        }
                }
        }
}

So I noticed segment4 comes before segment3 (based off stripes = [ "pv0",2800751 ]) and standard extent size was 4MB, so I wrote the following:

echo "writing seg 1 .."
dd if=/dev/md0 of=/dev/md1 bs=4194304 seek=0 skip=25794 count=2145769

echo "writing seg 2 .."
dd if=/dev/md0 of=/dev/md1 bs=4194304 seek=2145769 skip=2174063 count=626688

echo "writing seg 3 .."
dd if=/dev/md0 of=/dev/md1 bs=4194304 seek=2772457 skip=2954351 count=384170

echo "writing seg 4 .."
dd if=/dev/md0 of=/dev/md1 bs=4194304 seek=3156627 skip=2800751 count=140118

then just to make sure things were clean, I zeroed out the remainder of /dev/md1

I used the hex editor (shed) again to make sure the first four bytes on the drive are XFSB.

Once done, I tried xfs_repair again, this time on /dev/md1 with the same results as above.

Next I tried xfs_db /dev/md1 to see if anything would load. I get the following:

root@jarvis:/mnt# xfs_db /dev/md1
Floating point exception

With the following in dmesg:

[1568395.691767] xfs_db[30966] trap divide error ip:41e4b5 sp:7fff5db8ab90 error:0 in xfs_db[400000+6a000]

So at this point I'm stumped. I'm hoping one of you clever folks out there might have some next steps I can take. I'm okay with a partial recovery, and I'm okay if the directory tree gets horked and I have to dig through lost+found, but I'd really like to at least be able to recover something from this. I'm happy to post any info needed on this.

Thanks!

-Aaron

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs