Eric Sandeen <sandeen@xxxxxxxxxxx> writes: > I'm sure you won't like this answer, Hi, Eric. I know enough about XFS to recognize your name, and it is not like I am paying for support... So actually I am just grateful for your reply. > and I can't base it on empirical evidence, but my first hunch would be > that your controller did a poor job of recovering from the error, and > damaged the storage beneath the filesystem. I admit this is possible, but... We have two RAID containers inside each JBOD. Each JBOD has a single SAS cable to the hardware RAID card. Only one of the RAID containers suffered damage; the other container in the same JBOD is fine. I can believe the RAID card did not recover particularly gracefully, but I do not think we lost more than a few blocks on the file system. For one thing, there wasn't enough time. Until we ran xfs_repair, that is. > On a more concrete note, it would be interestting to run xfs_bmap -vv > on some of those files with zeros and see what extents, if any, cover > the zeroed ranges. i.e. are they holes, allocated, unwritten, etc. I tried this on a few of the damaged files. Here is a typical output: # xfs_bmap -p -v xxx xxx: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..16255]: 195467240568..195467256823 91 (46229328..46245583) 16256 00000 1: [16256..715959]: 195477629880..195478329583 91 (56618640..57318343) 699704 00000 Looking at the "zeroed" data ranges (there are several), none of them are near the beginning nor end of either extent. None of the files I looked at had FLAGS other than 00000. All of the zeroed ranges I checked are page-aligned (4K multiple). It really feels like some small amount of damage in one area of the file system got amplified into corruption across many files' contents by xfs_repair. I do not know much about XFS internals, so forgive me if the following is stupid... I imagine there are global data structures recording the free/in-use blocks, as well as local data structures recording the extents used by each file. Is it possible xfs_repair decided to "trust" some corrupted global data structure instead of the local extents associated with each file, and responded by wiping parts of the latter? In general, could anything cause xfs_repair to zero out whole ranges of blocks allocated to many files? Thanks again. - Pat -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html