Re: xfs_repair after data corruption (not caused by xfs, but by failing nvme drive)

Christian Brauner <brauner@xxxxxxxxxx> · Fri, 24 Jan 2025 09:53:02 +0100

On Thu, Jan 23, 2025 at 08:58:26AM +1100, Dave Chinner wrote:
> On Mon, Jan 20, 2025 at 04:15:00PM +0100, Christian Brauner wrote:
> > Hey,
> > 
> > so last week I got a nice surprise when my (relatively new) nvme drive
> > decided to tell me to gf myself. I managed to recover by now and get
> > pull requests out and am back in a working state.
> > 
> > I had to reboot and it turned out that my LUKS encrypted xfs filesystem
> > got corrupted. I booted a live image and did a ddrescue to an external
> > drive in the hopes of recovering the things that hadn't been backed up
> > and also I didn't want to have to go and setup my laptop again.
> > 
> > The xfs filesystem was mountable with:
> > 
> > mount -t xfs -o norecovery,ro /dev/mapper/dm4 /mnt
> > 
> > and I was able to copy out everything without a problem.
> > 
> > However, I was curious whether xfs_repair would get me anything and so I
> > tried it (with and without the -L option and with and without the -o
> > force_geometry option).
> > 
> > What was surprising to me is that xfs_repair failed at the first step
> > finding a usable superblock:
> > 
> > > sudo xfs_repair /dev/mapper/dm-sdd4
> > Phase 1 - find and verify superblock...
> > couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!
> > 
> > attempting to find secondary superblock...
> > ..found candidate secondary superblock...
> > unable to verify superblock, continuing...
> > ....found candidate secondary superblock...
> > unable to verify superblock, continuing...
> 
> Yeah, so it's a 4 AG filesystem so it has 1 primary superblock and 2
> secondary superblocks. Two of the 3 secondary superblocks are trash,
> and repair needs 2 of the secondary superblocks to match the primary
> for it to validate the primary as a good superblock.
> 
> xfs_repair considers this situation as "too far gone to reliably
> repair" and so aborts.
> 
> I did notice a pattern to the corruption, though. while sb 1 is
> trashed, the adjacent sector (agf 1) is perfectly fine. So is agi 1.
> But then agfl 1 is trash. But then the first filesystem block after
> these (a free space btree block) is intact. In the case of sb 3,
> it's just a single sector that is gone.
> 
> To find if there were any other metadata corruptions, I copied the
> primary superblock over the corrupted one in AG 1:
> 
> xfs_db> sb 1
> Superblock has bad magic number 0xa604f4c6. Not an XFS filesystem?
> xfs_db> daddr
> datadev daddr is 246871552
> xfs_db> q
> $ dd if=t.img of=t.img oseek=246871552 bs=512 count=1 conv=notrunc
> ...
> 
> and then ran repair on it again. This time repair ran (after zeroing
> the log) and there were no corruptions other than what I'd expect
> from zeroing the log (e.g. unlinked inode lists were populated,
> some free space mismatches, etc).
> 
> Hence there doesn't appear to be any other metadata corruptions
> outside of the 3 bad sectors already identified. Two of those
> sectors were considered critical by repair, hence it's failure.
> 
> What I suspect happened is that the drive lost the first page that
> data was ever written to - mkfs lays down the AG headers first, so
> there is every chance that the FTL has put them in the same physical
> page. the primary superblock, all the AGI, AGF and AGFL headers get
> rewritten all the time, so the current versions of them will be
> immediately moved to some other page. hence if the original page is
> lost, the contents of those sectors will still be valid. However,
> the superblocks never get rewritten, so only they get lost.
> 
> Journal recovery failed on the AGFL sector in AG 1 that was also
> corrupted - that had been rewritten many times, so it's possible
> that the drive lost multiple flash pages. It is also possible that
> garbage collection had recently relocated the secondary superblocks
> and that AGFL into the same page and that was lost. This is only
> speculation, though.

Thanks for taking the time to look into this!

> 
> That said, Christian, I wouldn't trust any of the recovered data to
> be perfectly intact - there's every chance random files have random

Yes, I think I'm fine with that risk. The data I recovered is strictly
from /home/ so at least I won't have to worry about some system library
being corrupted.

> data corruption in them. Even though the filesystem was recovered,
> it is worth checking the validity of the data as much as you can...

Fwiw, xfs did a great job here. I was very happy how it behaved even
though that drive was shot to hell!