Re: Metadata corruption detected, fatal error -- couldn't map inode, err = 117

Phillip Ferentinos <phillip.jf@xxxxxxxxx> · Wed, 27 Dec 2023 14:22:47 -0600

On Tue, Dec 26, 2023 at 2:36 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Thu, Dec 21, 2023 at 08:05:43PM -0600, Phillip Ferentinos wrote:
> > Hello,
> >
> > Looking for opinions on recovering a filesystem that does not
> > successfully repair from xfs_repair.
>
> What version of xfs_repair?

# xfs_repair -V
xfs_repair version 6.1.1

> > On the first xfs_repair, I was
> > prompted to mount the disk to replay the log which I did successfully.
> > I created an image of the disk with ddrescue and am attempting to
> > recover the data. Unfortunately, I do not have a recent backup of this
> > disk.
>
> There is lots of random damage all over the filesystem. What caused
> this damage to occur? I generally only see this sort of widespread
> damage when RAID devices (hardware or software) go bad...
>
> Keep in mind that regardless of whether xfs_repair returns the
> filesystem to a consistent state, the data in the filesystem is
> still going to be badly corrupted. If you don't have backups, then
> there's a high probability of significant data loss here....

My best guess is a power outage causing an unclean shutdown. I am
running Unraid:
# uname -a
Linux Tower 6.1.64-Unraid #1 SMP PREEMPT_DYNAMIC Wed Nov 29 12:48:16
PST 2023 x86_64 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz GenuineIntel
GNU/Linux

I was able to load the disk img in UFS Explorer on an Ubuntu machine
and, as far as I can tell, all the data is in the img but it reports a
handful of bad objects.
https://github.com/phillipjf/xfs_debug/blob/main/ufs-explorer.png

> > The final output of xfs_repair is:
> >
> > Phase 5 - rebuild AG headers and trees...
> >         - reset superblock...
> > Phase 6 - check inode connectivity...
> >         - resetting contents of realtime bitmap and summary inodes
> >         - traversing filesystem ...
> > rebuilding directory inode 12955326179
> > Metadata corruption detected at 0x46fa05, inode 0x38983bd88 dinode
>
> Can you run 'gdb xfs_repair' and run the command 'l *0x46fa05' to
> dump the line of code that the error was detected at? You probably
> need the distro debug package for xfsprogs installed to do this.

At first try, doesn't look like gdb is available on Unraid and I think
it would be more trouble than it's worth to get it set up.
---
On the Ubuntu machine, I have
# xfs_repair -V
xfs_repair version 6.1.1

When running gdb on Ubuntu, I get:
$ gdb xfs_repair
(gdb) set args -f /media/sdl1.img
(gdb) run
...
Metadata corruption detected at 0x5555555afd95, inode 0x38983bd88 dinode

fatal error -- couldn't map inode 15192014216, err = 117
...
(gdb) l *0x5555555afd95
0x5555555afd95 is in libxfs_dinode_verify (../libxfs/xfs_inode_buf.c:586).
Downloading source file
/build/xfsprogs-QBD5z8/xfsprogs-6.3.0/repair/../libxfs/xfs_inode_buf.c
581     ../libxfs/xfs_inode_buf.c: No such file or directory.

> > fatal error -- couldn't map inode 15192014216, err = 117
> >
> > The full log is:
> > https://raw.githubusercontent.com/phillipjf/xfs_debug/main/xfs_repair_1.log
>
> That's messy.
>
> > Based on another discussion (https://narkive.com/4dDxIees.10), I've
> > included the specific inode:
> > https://raw.githubusercontent.com/phillipjf/xfs_debug/main/xfs_db_01.log
>
> Nothing obviously wrong with that inode in the image file - it's a
> directory inode in node format that looks to be internally
> consistent.  But that inode has been processed earlier in the repair
> process, so maybe it's bad in memory as a result of trying to fix
> some other problem. Very hard to say given how much other stuff is
> broken and is getting either trashed, re-initialised or repaired up
> to that point....
>
> > I also cannot create a metadump due to the following issue:
> > https://raw.githubusercontent.com/phillipjf/xfs_debug/main/xfs_metadump_01.log.
>
> No surprise, metadump has to traverse the metadata in order to dump
> it, and if the metadata is corrupt then the traversals can fail
> leading to a failed dump. The more badly damaged the filesystem is,
> the more likely a metadump failure is.
>
> -Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

I only had one extra 12TB drive on hand which currently has the .img
of the bad drive on it. Have some 10TB drives on order which should
give enough space to restore data with UFS Explorer and, if so, will
resolve all these issues. But, possibly, this information is helpful
for development.

Thanks for the help! Please let me know what more info I can provide.
- Phillip Ferentinos