Re: xfs_repair after data corruption (not caused by xfs, but by failing nvme drive)

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 23 Jan 2025 08:58:26 +1100

On Mon, Jan 20, 2025 at 04:15:00PM +0100, Christian Brauner wrote:
> Hey,
> 
> so last week I got a nice surprise when my (relatively new) nvme drive
> decided to tell me to gf myself. I managed to recover by now and get
> pull requests out and am back in a working state.
> 
> I had to reboot and it turned out that my LUKS encrypted xfs filesystem
> got corrupted. I booted a live image and did a ddrescue to an external
> drive in the hopes of recovering the things that hadn't been backed up
> and also I didn't want to have to go and setup my laptop again.
> 
> The xfs filesystem was mountable with:
> 
> mount -t xfs -o norecovery,ro /dev/mapper/dm4 /mnt
> 
> and I was able to copy out everything without a problem.
> 
> However, I was curious whether xfs_repair would get me anything and so I
> tried it (with and without the -L option and with and without the -o
> force_geometry option).
> 
> What was surprising to me is that xfs_repair failed at the first step
> finding a usable superblock:
> 
> > sudo xfs_repair /dev/mapper/dm-sdd4
> Phase 1 - find and verify superblock...
> couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!
> 
> attempting to find secondary superblock...
> ..found candidate secondary superblock...
> unable to verify superblock, continuing...
> ....found candidate secondary superblock...
> unable to verify superblock, continuing...

Yeah, so it's a 4 AG filesystem so it has 1 primary superblock and 2
secondary superblocks. Two of the 3 secondary superblocks are trash,
and repair needs 2 of the secondary superblocks to match the primary
for it to validate the primary as a good superblock.

xfs_repair considers this situation as "too far gone to reliably
repair" and so aborts.

I did notice a pattern to the corruption, though. while sb 1 is
trashed, the adjacent sector (agf 1) is perfectly fine. So is agi 1.
But then agfl 1 is trash. But then the first filesystem block after
these (a free space btree block) is intact. In the case of sb 3,
it's just a single sector that is gone.

To find if there were any other metadata corruptions, I copied the
primary superblock over the corrupted one in AG 1:

xfs_db> sb 1
Superblock has bad magic number 0xa604f4c6. Not an XFS filesystem?
xfs_db> daddr
datadev daddr is 246871552
xfs_db> q
$ dd if=t.img of=t.img oseek=246871552 bs=512 count=1 conv=notrunc
...

and then ran repair on it again. This time repair ran (after zeroing
the log) and there were no corruptions other than what I'd expect
from zeroing the log (e.g. unlinked inode lists were populated,
some free space mismatches, etc).

Hence there doesn't appear to be any other metadata corruptions
outside of the 3 bad sectors already identified. Two of those
sectors were considered critical by repair, hence it's failure.

What I suspect happened is that the drive lost the first page that
data was ever written to - mkfs lays down the AG headers first, so
there is every chance that the FTL has put them in the same physical
page. the primary superblock, all the AGI, AGF and AGFL headers get
rewritten all the time, so the current versions of them will be
immediately moved to some other page. hence if the original page is
lost, the contents of those sectors will still be valid. However,
the superblocks never get rewritten, so only they get lost.

Journal recovery failed on the AGFL sector in AG 1 that was also
corrupted - that had been rewritten many times, so it's possible
that the drive lost multiple flash pages. It is also possible that
garbage collection had recently relocated the secondary superblocks
and that AGFL into the same page and that was lost. This is only
speculation, though.

That said, Christian, I wouldn't trust any of the recovered data to
be perfectly intact - there's every chance random files have random
data corruption in them. Even though the filesystem was recovered,
it is worth checking the validity of the data as much as you can...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx