On 2/9/24 12:39 PM, Jorge Garcia wrote: > Hello, > > We have a server with a very large (300+ TB) XFS filesystem that we > use to provide downloads to the world. Last week's storms in > California caused damage to our machine room, causing unexpected power > surges and power outages, even in our UPS and generator backed data > center. One of the end results was some data corruption on our server > (running Centos 8). After looking around the internet for solutions to > our issues, the general consensus seemed to be to run xfs_repair on > the filesystem to get it to recover. We tried that (xfs_repair V 5.0) > and it seemed to report lots of issues before eventually failing > during "Phase 6" with an error like: > > Metadata corruption detected at 0x46d6c4, inode 0x8700657ff8 dinode > > fatal error -- couldn't map inode 579827236856, err = 117 > > After another set of internet searches, we found some postings that > suggested this could be a bug that may have been fixed in later > versions, so we built xfs_repair V 6.5 and tried the repair again. The > results were the same. We even tried "xfs_repair -L", and no joy. So > now we're desperate. Is the data all lost? We can't mount the > filesystem. We tried using xfs_metadump (another suggestion from our > searches) and it reports lots of metadata corruption ending with: I was going to suggest creating an xfs_metadump image for analysis. Was that created with xfsprogs v6.5.0 as well? > Metadata corruption detected at 0x4382f0, xfs_cntbt block 0x1300023518/0x1000 > Metadata corruption detected at 0x4382f0, xfs_cntbt block 0x1300296bf8/0x1000 > Metadata corruption detected at 0x4382f0, xfs_bnobt block 0x137fffb258/0x1000 > Metadata corruption detected at 0x4382f0, xfs_bnobt block 0x138009ebd8/0x1000 > Metadata corruption detected at 0x467858, xfs_inobt block 0x138067f550/0x1000 > Metadata corruption detected at 0x467858, xfs_inobt block 0x13834b39e0/0x1000 > xfs_metadump: bad starting inode offset 5 so the metadump did not complete? Does the filesystem mount? Can you mount it -o ro or -o ro,norecovery to see how much you can read off of it? If mount fails, what is in the kernel log when it fails? > Not sure what to try next. Any help would be greatly appreciated. Thanks! Power losses really should not cause corruption, it's a metadata journaling filesytem which should maintain consistency even with a power loss. What kind of storage do you have, though? Corruption after a power loss often stems from a filesystem on a RAID with a write cache that does not honor data integrity commands and/or does not have its own battery backup. -Eric > Jorge >