Re: file corruption issue

"Patrick Shirkey" <pshirkey@xxxxxxxxxxxxxxxxx> · Tue, 15 May 2012 02:58:42 +0200 (CEST)

On Mon, May 14, 2012 4:29 pm, Ben Myers wrote:
> Hey Patrick,
>
> On Mon, May 14, 2012 at 03:45:06AM +0200, Patrick Shirkey wrote:
>>
>> On Fri, May 11, 2012 6:50 pm, Ben Myers wrote:
>> > On Fri, May 11, 2012 at 03:27:02AM +0200, Patrick Shirkey wrote:
>> >> I have some HP machines running centos:
>> >>
>> >> kernel 2.6.32-042stab049.6
>> >> AMD Opteron(tm) Processor 6180 SE
>> >> RAM:   528 GB
>> >> RAID bus controller: Hewlett-Packard Company Smart Array G6
>> controllers
>> >>
>> >> We have experienced some kernel crashes due to a kernel bug with
>> >> interleaving ram on this hardware which require hard reset of the
>> >> machines.
>> >>
>> >> After reboot we are finding that there is severe file corruption on
>> the
>> >> xfs file system where TBs of readonly databases are getting partially
>> or
>> >> fully truncated.
>> >>
>> >> Has anyone come across this or similar?
>> >
>> > This rings a bell for me but I can't be certain.  Could you provide a
>> > metadump?
>> >
>>
>> The machines are live so we have already restored the data several
>> times.
>> Will a metadump from the existing file system be useful or do you need
>> it
>> post crash?
>
> Well... one of each would be best.  It might be helpful to compare the
> block
> map from before the crash with the block map after the crash for one of
> the
> read-only corrupted databases.
>

Unfortunately I cannot unmount the partition/s to run xfs_metadump because
they are in use.

I have found some files that were truncated on a recent crash. Is there
any tool I can run on those files to get info that might be useful?

--
Patrick Shirkey
Boost Hardware Ltd

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs