Temporary drive failure leads to massive data corruption?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi. We are using XFS on a hardware RAID6 container with around 100
terabytes of data in 500K files. (Actually, we have four such
containers per server and around a dozen servers.)

Anyway, we had a power event a couple of nights ago that took several
of the drives -- and thus the container -- offline.

We got the drives, and thus the hardware RAID6, back online, but when
we tried to mount the file system the message said it was corrupted
and we should run xfs_repair. Running xfs_repair complained that there
were uncommitted entries in the transaction log and we should try to
mount the file system.

Ultimately, we had to use "xfs_repair -L" to get the file system to mount.

Now, I understand that any files or directories being modified during
the event could be corrupted. But we are seeing something completely
different; namely...

Tens of thousands of our files -- each 100-ish megabytes -- appear to
have had large sections replaced with zeroes. (We are still evaluating
the damage.) None of these files were being modified at the time; in
fact, the majority were written years ago and are never changed.

Is this an expected failure mode for XFS? I understand we may have
corrupted a few disk blocks, but should we expect that to corrupt a
significant fraction of our at-rest data?

This is using Red Hat Enterprise Linux 6.6 (kernel
2.6.32-504.16.2.el6.x86_64 of Tue Apr 21 10:35:19 CDT 2015), if it
makes a difference.

Thanks!

 - Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux