Hi all, I recently had to manage a storage failure on a ~150 TB XFS volume and I just wanted to check with the group here to see if anything could have been done differently. Here is my story. We had a 150 TB RAID 60 volume formatted with XFS. The volume was made up of two 21-drive RAID 6 strings (4 TB drives). This was all done with Linux MD software RAID. The filesystem was filled to 100% capacity when it failed. I'm not sure if this contributed to the poor outcome. There was no backup available of this filesystem (of course). About a week ago, we had two drives become spuriously ejected from one of the two RAID 6 strings that composed this volume. This seems to happen sometimes as a result of various hardware and software glitches. We checked the drives with smartctl, added them back to the array and a resync operation started. The resync ran for a little while and failed, because a third disk in the array (which mdadm had never failed out, and smartctl still thought was OK) reported a read error/bad blocks and dropped out of the array. We decided to clone the failed disk to a brand new replacement drive with: dd conv=notrunc,noerror,sync Figuring we'd lose a few sectors to get nulled out, but we'd have a drive that could run the rebuild without getting kicked due to read errors (we've used this technique in the past to recover from this kind of situation successfully). Clone completed. We swapped the clone drive with the bad blocks drive and kicked off another rebuild. Rebuild fails again because a fourth drive is throwing bad blocks/read errors and gets kicked out of the array. We scan all 21 drives in this array with smartctl and there are actually three more drives in total where SMART has logged read errors. This is starting to look pretty bad but what can we do? We just clone these three drives to three more fresh drives using dd conv=notrunc,noerror,sync. Swap them in for the old bad block drives and kick off another rebuild. The rebuild actually runs and completes successfully. MD thinks the array is fine, running, not degraded at all. We mount the array. It mounts, but it is obviously pretty damaged. Normally when this happens we try to mount it read only and copy off what we can, then write it off. This time, we can't hardly do anything but an "ls" in the filesystem without getting "structure needs cleaning". Doing any kind of material access to the filesystem gives various major errors (i.e. "in-memory corruption of filesystem data detected") and the filesystem goes offline. Reads just fail with I/O errors. What can we do? Seems like at this stage we just run xfs_repair and hope for the best, right? Ran xfs_repair in dry run mode and it's looking pretty bad, just from the sheer amount of output. But there's no real way to know exactly how much data xfs_repair will wipe out, and what alternatives do we have? The filesystem hardly mounts without faulting anyway. Seems like there's little choice going forward to run it, and see what shakes out. We run xfs_repair overnight. It ran for a while, then eventually hung in Phase 4, I think. We killed xfs_repair off and re-ran it with the -P flag. It runs for maybe two or three hours and eventually completes. We mount the filesystem up. Of around 150 TB, we have maybe 10% of that in data salad in lost+found, 21 GB of good data and the rest is gone. Copy off what we can, and call it dead. This is where we're at now. It seems like the MD rebuild process really scrambled things somehow. I'm not sure if this was due to some kind of kernel bug, or just zeroed out bad sectors in wrong places or what. Once the md resync ran, we were cooked. I guess, after blowing through four or five "Hope you have a backup, but if not, you can try this and pray" checkpoints, I just want to check with the developers and group here to see if we did the best thing possible given the circumstances? Xfs_repair is it, right? When things are that scrambled, pretty much all you can do is run an xfs_repair and hope for the best? Am I correct in thinking that there is no better or alternative tool that will give different results? Can a commercial data recovery service make any better sense of a scrambled XFS than xfs_repair could? When the underlying device is presenting OK, just scrambled data on it? Thanks, Sean