On Tue, Feb 01, 2022 at 06:07:18PM -0500, Sean Caron wrote: > Hi all, > > Me again with another not-backed-up XFS filesystem that's in a little > trouble. Last time I stopped by to discuss my woes, I was told that I > could check in here and get some help reading the tea leaves before I > do anything drastic so I'm doing that :) > > Brief backstory: This is a RAID 60 composed of three 18-drive RAID 6 > strings of 8 TB disk drives, around 460 TB total capacity. Last week > we had a disk fail out of the array. We replaced the disk and the > recovery hung at around 70%. > > We power cycled the machine and enclosure and got the recovery to run > to completion. Just as it finished up, the same string dropped another > drive. > > We replaced that drive and started recovery again. It got a fair bit > into the recovery, then hung just as did the first drive recovery, at > around +/- 70%. We power cycled everything again, then started the > recovery. As the recovery was running again, a third disk started to > throw read errors. > > At this point, I decided to just stop trying to recover this array so > it's up with two disks down but otherwise assembled. I figured I would > just try to mount ro,norecovery and try to salvage as much as possible > at this point before going any further. > > Trying to mount ro,norecovery, I am getting an error: Seeing as you've only lost redundancy at this point in time, this will simply result in trying to mount the filesystem in an inconsistent state and so you'll see metadata corruptions because the log has no be replayed. > metadata I/O error in "xfs_trans_read_buf_map at daddr ... len 8 error 74 > Metadata CRC error detected at xfs_agf_read_verify+0xd0/0xf0 [xfs], > xfs_agf block ... > > I ran an xfs_repair -L -n just to see what it would spit out. It > completes within 15-20 minutes (which I feel might be a good sign, > from my experience, outcomes are inversely proportional to run time), > but the output is implying that it would unlink over 100,000 files > (I'm not sure how many total files are on the filesystem, in terms of > what proportion of loss this would equate to) and it also says: > > "Inode allocation btrees are too corrupted, skipping phases 6 and 7" This is expected because 'xfs_repair -n' does not recover the log. Hence you're running checks on an inconsistent fs and repair is detecting that the inobts are inconsistent so it can't check the directory structure connectivity and link counts sanely. What you want to do here is take a metadump of the filesystem (it's an offline operation) and restore it to a an image file on a different system (creates a sparse file so just needs to run on a fs that supports file sizes > 16TB). You can then mount the image file via "mount -o loop <fs.img> <mntpt>", and it run log recovery on the image. Then you can unmount it again and see if the resultant filesystem image contains any corruption via 'xfs_repair -n'. If there's no problems found, then the original filesysetm is all good an all you need to do is mount it and everythign should be there ready for the migration process to non-failing storage. If there are warnings/repairs needed then you're probably best to post the output of 'xfs_reapir -n' so we can review it and determine the best course of action from there. IOWs, do all the diagnosis/triage of the filesytem state on the restored metadump images so that we don't risk further damaging the real storage. If we screw up a restored filesystem image, no big deal, we can just return it to the original state by restoring it from the metadump again to try something different. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx