Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)

Shaun Adolphson <shaun@xxxxxxxxxxxxx> · Sun, 11 Jul 2010 21:47:43 +1000



On Sun, Jul 11, 2010 at 9:44 PM, Shaun Adolphson <shaun@xxxxxxxxxxxxx> wrote:
> On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@xxxxxxxxxxxxx> wrote:
>> On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>>
>>> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
>>> > Hi,
>>> >
>>> > We have been able to repeatably produce xfs internal errors
>>> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
>>> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
>>> > xfs drive. The copy gets about 96% of the way through and we get the
>>> > following messages:
>>> >
>>> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
>>> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
>>> > Caller 0xffffffff8837446f
>>>
>>> Interesting. That's a corrupted inode extent btree - I haven't seen
>>> one of them for a long while. Were there any errors (like IO errors)
>>> reported before this?
>>>
>>> However, the first step is to determine if the error is on disk or an
>>> in-memory error. Can you post output of:
>>>
>>>        - xfs_info <mntpt>
>
> meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
> agsize=32768 blks
>              =                      sectsz=512   attr=1
> data        =                      bsize=4096   blocks=4272433152, imaxpct=25
>              =                      sunit=0      swidth=0 blks
> naming   =version 2         bsize=4096   ascii-ci=0
> log         =internal            bsize=4096   blocks=2560, version=1
>             =                       sectsz=512   sunit=0 blks, lazy-count=0
> realtime  =none               extsz=4096   blocks=0, rtextents=0
>
>
>>>        - xfs_repair -n after a shutdown
>
> The out out of the xfs_repair -n is 6mb, below is the condensed
> version. I can post the whole output if required.
>
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>        - scan filesystem freespace and inode maps...
>        - found root inode chunk
> Phase 3 - for each AG...
>        - scan (but don't clear) agi unlinked lists...
>        - process known inodes and perform inode discovery...
>       - agno = 0
> .
> .
> .
>       - agno = 130384
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
.
.
.
       - agno = 130384
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>        - traversing filesystem ...
>        - traversal finished ...
>        - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
>
>
>
>
>>>
>>> Can you upgrade xfsprogs (i.e. xfs_repair) to the latest version
>>> (3.1.2) before you do this as well?
>
> # xfs_repair -V
> xfs_repair version 3.1.2
>
>
>>
>> We have upgraded the xfsprogs to 3.1.2 and in the process of
>> collecting the required infomation.
>>
>>>
>>> > We have reproduced the condition 3 times and each time we have been
>>> > able to remount the drive ( to replay the transaction log ) and then
>>> > preform and xfs_repair.
>>> >
>>> > We are just using cp to copy the file.
>>> >
>>> > Some further details about the system:
>>> >
>>> > Software:
>>> > - Fresh install of CentOS 5.5 64bit all patches up to date
>>> > - Kernel 2.6.18-194.3.1.el5.centos.plus
>>>
>>> I've got no idea exactly what version of XFS that has in it, so I
>>> can't say off the top of my head whether this is a fixed bug or not.
>>>
>>> Cheers,
>>>
>>> Dave.
>>> --
>>> Dave Chinner
>>> david@xxxxxxxxxxxxx
>>
>>
>>
>> During other testing we have also been able to reproduce the issue by
>> copying  a self generated 248Gig file from another system disk to the
>> XFS disk. The file was generated using dd with an input of /dev/zero.
>>
>> All the existing data (~6TB ) was successfully copied onto the storage
>> with out have the error. The thing to note is that all the existing
>> files are much smaller than the one that we are trying to copy in (
>> 248Gig ). And since we have been having the shutdown we have copied
>> many smaller files ( files < 30Gig in size ) onto the storage area
>> with out issue
>>
>
> Thanks,
>
> Shaun
>

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs