Re: XFS disaster recovery

Sean Caron <scaron@xxxxxxxxx> · Mon, 7 Feb 2022 17:03:03 -0500

Hi Dave,

OK! With your patch and help on that other thread pertaining to
xfs_metadump I was able to get a full metadata dump of this
filesystem.

I used xfs_mdrestore to set up a sparse image for this volume using my
dumped metadata:

xfs_mdrestore /exports/home/work/md4.metadump /exports/home/work/md4.img

Then set up a loopback device for it and tried to mount.

losetup --show --find /exports/home/work/md4.img
mount /dev/loop0 /mnt

When I do this, I get a "Structure needs cleaning" error and the
following in dmesg:

[523615.874581] XFS (loop0): Corruption warning: Metadata has LSN
(7095:2330880) ahead of current LSN (7095:2328512). Please unmount and
run xfs_repair (>= v4.3) to resolve.
[523615.874637] XFS (loop0): Metadata corruption detected at
xfs_agi_verify+0xef/0x180 [xfs], xfs_agi block 0x10
[523615.874666] XFS (loop0): Unmount and run xfs_repair
[523615.874679] XFS (loop0): First 128 bytes of corrupted metadata buffer:
[523615.874695] 00000000: 58 41 47 49 00 00 00 01 00 00 00 00 0f ff ff
f8  XAGI............
[523615.874713] 00000010: 00 03 ba 40 00 04 ef 7e 00 00 00 02 00 00 00
34  ...@...~.......4
[523615.874732] 00000020: 00 30 09 40 ff ff ff ff ff ff ff ff ff ff ff
ff  .0.@............
[523615.874750] 00000030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff  ................
[523615.874768] 00000040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff  ................
[523615.874787] 00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff  ................
[523615.874806] 00000060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff  ................
[523615.874824] 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff  ................
[523615.874914] XFS (loop0): metadata I/O error in
"xfs_trans_read_buf_map" at daddr 0x10 len 8 error 117
[523615.874998] XFS (loop0): xfs_imap_lookup: xfs_ialloc_read_agi()
returned error -117, agno 0
[523615.876866] XFS (loop0): Failed to read root inode 0x80, error 117

Seems like the next step is to just run xfs_repair (with or without
log zeroing?) on this image and see what shakes out?

Thanks,

Sean

On Tue, Feb 1, 2022 at 6:33 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Tue, Feb 01, 2022 at 06:07:18PM -0500, Sean Caron wrote:
> > Hi all,
> >
> > Me again with another not-backed-up XFS filesystem that's in a little
> > trouble. Last time I stopped by to discuss my woes, I was told that I
> > could check in here and get some help reading the tea leaves before I
> > do anything drastic so I'm doing that :)
> >
> > Brief backstory: This is a RAID 60 composed of three 18-drive RAID 6
> > strings of 8 TB disk drives, around 460 TB total capacity. Last week
> > we had a disk fail out of the array. We replaced the disk and the
> > recovery hung at around 70%.
> >
> > We power cycled the machine and enclosure and got the recovery to run
> > to completion. Just as it finished up, the same string dropped another
> > drive.
> >
> > We replaced that drive and started recovery again. It got a fair bit
> > into the recovery, then hung just as did the first drive recovery, at
> > around +/- 70%. We power cycled everything again, then started the
> > recovery. As the recovery was running again, a third disk started to
> > throw read errors.
> >
> > At this point, I decided to just stop trying to recover this array so
> > it's up with two disks down but otherwise assembled. I figured I would
> > just try to mount ro,norecovery and try to salvage as much as possible
> > at this point before going any further.
> >
> > Trying to mount ro,norecovery, I am getting an error:
>
> Seeing as you've only lost redundancy at this point in time, this
> will simply result in trying to mount the filesystem in an
> inconsistent state and so you'll see metadata corruptions because
> the log has no be replayed.
>
> > metadata I/O error in "xfs_trans_read_buf_map at daddr ... len 8 error 74
> > Metadata CRC error detected at xfs_agf_read_verify+0xd0/0xf0 [xfs],
> > xfs_agf block ...
> >
> > I ran an xfs_repair -L -n just to see what it would spit out. It
> > completes within 15-20 minutes (which I feel might be a good sign,
> > from my experience, outcomes are inversely proportional to run time),
> > but the output is implying that it would unlink over 100,000 files
> > (I'm not sure how many total files are on the filesystem, in terms of
> > what proportion of loss this would equate to) and it also says:
> >
> > "Inode allocation btrees are too corrupted, skipping phases 6 and 7"
>
> This is expected because 'xfs_repair -n' does not recover the log.
> Hence you're running checks on an inconsistent fs and repair is
> detecting that the inobts are inconsistent so it can't check the
> directory structure connectivity and link counts sanely.
>
> What you want to do here is take a metadump of the filesystem (it's
> an offline operation) and restore it to a an image file on a
> different system (creates a sparse file so just needs to run on a fs
> that supports file sizes > 16TB). You can then mount the image file
> via "mount -o loop <fs.img> <mntpt>", and it run log recovery on the
> image. Then you can unmount it again and see if the resultant
> filesystem image contains any corruption via 'xfs_repair -n'.
>
> If there's no problems found, then the original filesysetm is all
> good an all you need to do is mount it and everythign should be
> there ready for the migration process to non-failing storage.
>
> If there are warnings/repairs needed then you're probably best to
> post the output of 'xfs_reapir -n' so we can review it and determine
> the best course of action from there.
>
> IOWs, do all the diagnosis/triage of the filesytem state on the
> restored metadump images so that we don't risk further damaging the
> real storage. If we screw up a restored filesystem image, no big
> deal, we can just return it to the original state by restoring it
> from the metadump again to try something different.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx