Re: XFS internal error XFS_WANT_CORRUPTED_GOTO

Amit Sahrawat <amit.sahrawat83@xxxxxxxxx> · Fri, 22 Jul 2011 12:29:21 +0530

Analysis of the logs using xfs_logprint – shows issues seems related
with the journaling and the data writeback.
Data for the file is being written first but the journal is not
consistent with that information.
Last committed transaction from the logs shows that – 19492251 free
blocks are there in the allocation group. While the file which was
being written at the time of reset had ‘255 blocks’
____________________________________________________________________________________
Allocation group length = 19492366
Free blocks = 19492251
Difference = 115 blocks (i.e., only 115 blocks from this allocation
group are used)
_____________________________________________________________________________________
----------------------------------------------------------------------------
Oper (472): tid: 92ae5cd0  len: 0  clientid: TRANS  flags: START
----------------------------------------------------------------------------
Oper (473): tid: 92ae5cd0  len: 16  clientid: TRANS  flags: none
TRAN:    type: DIOSTRAT       tid: 0       num_items: 4
----------------------------------------------------------------------------
Oper (474): tid: 92ae5cd0  len: 56  clientid: TRANS  flags: none
INODE: #regs: 3   ino: 0x84  flags: 0x5   dsize: 16
        blkno: 64  len: 16  boff: 1024
Oper (475): tid: 92ae5cd0  len: 96  clientid: TRANS  flags: none
INODE CORE
magic 0x494e mode 0100644 version 2 format 2
nlink 1 uid 0 gid 0
atime 0xc mtime 0xc ctime 0xc
size 0x65400 nblocks 0x67 extsize 0x0 nextents 0x1
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x5
Oper (476): tid: 92ae5cd0  len: 16  clientid: TRANS  flags: none
EXTENTS inode data
----------------------------------------------------------------------------
Oper (477): tid: 92ae5cd0  len: 24  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 1 (0x1)  len: 1  bmap size: 1  flags: 0x0
Oper (478): tid: 92ae5cd0  len: 128  clientid: TRANS  flags: none
AGF Buffer: XAGF
ver: 1  seq#: 0  len: 19492366
root BNO: 1  CNT: 2
level BNO: 1  CNT: 1
1st: 0  last: 3  cnt: 4  freeblks: 19492251  longest: 19492251
----------------------------------------------------------------------------
Oper (479): tid: 92ae5cd0  len: 28  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 16 (0x10)  len: 8  bmap size: 2  flags: 0x0
Oper (480): tid: 92ae5cd0  len: 128  clientid: TRANS  flags: none
BUF DATA
----------------------------------------------------------------------------
Oper (481): tid: 92ae5cd0  len: 28  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 8 (0x8)  len: 8  bmap size: 2  flags: 0x0
Oper (482): tid: 92ae5cd0  len: 128  clientid: TRANS  flags: none
BUF DATA
----------------------------------------------------------------------------
Oper (483): tid: 92ae5cd0  len: 0  clientid: TRANS  flags: COMMIT
__________________________________________________________________________

After running repair on this disk after the issue is observed:
File size which was written = 1044480 bytes = 255 blocks

# xfs_repair -L /dev/sda1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
data fork in ino 132 claims free block 115
data fork in ino 132 claims free block 116
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Done

# mount /dev/sdb1 /media/b
XFS mounting filesystem sda1

# ls –l /media/b
-rwxr-xr-x    1 root     0                0 Jan  1 00:00 test_code
-rw-r--r--    1 root     0          1044480 Jan  1 00:00 direct_io_file_0

Please let me know if the observations are wrong. Also, this seems
very debatable issue – but is there any fix for this?

Thanks & Regards,
Amit Sahrawat

On Fri, Jul 22, 2011 at 10:53 AM, Amit Sahrawat
<amit.sahrawat83@xxxxxxxxx> wrote:
> More logs for xfs_log_print and xfs_repair.
>
> Thanks & Regards,
> Amit Sahrawat
>
> On Fri, Jul 22, 2011 at 10:22 AM, Amit Sahrawat
> <amit.sahrawat83@xxxxxxxxx> wrote:
>> Dear All,
>>
>> Target : ARM
>>
>> Recently I encountered a corruption on XFS for RC-3. While the
>> DIRECT-IO for a file was in operation (Write operation) there was a
>> power reset - Only one file at a time is being written to the disk
>> using DIO.. After reboot on mounting I just tried to remove the file
>> and encountered the below mentioned corruption.  The hard disk is not
>> able to mount after this, only after clearing logs (xfs_repair –L) –
>> disk is able to mount
>> XFS mounting filesystem sda1
>> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1535 of file
>> fs/xfs/xfs_alloc.c.  Caller 0xc0152c04
>> Backtrace:
>> [<c0023000>] (dump_backtrace+0x0/0x110) from [<c02dd680>] (dump_stack+0x18/0x1c)
>>  r6:00000000 r5:c0152c04 r4:00000075 r3:e3ec1c88
>> [<c02dd668>] (dump_stack+0x0/0x1c) from [<c0176bd0>]
>> (xfs_error_report+0x4c/0x5c)
>> [<c0176b84>] (xfs_error_report+0x0/0x5c) from [<c01510d4>]
>> (xfs_free_ag_extent+0x400/0x600)
>> [<c0150cd4>] (xfs_free_ag_extent+0x0/0x600) from [<c0152c04>]
>> (xfs_free_extent+0x8c/0xa4)
>> [<c0152b78>] (xfs_free_extent+0x0/0xa4) from [<c015ffa8>]
>> (xfs_bmap_finish+0x108/0x194)
>>  r7:e3ec1e10 r6:00000000 r5:e3737870 r4:e373e000
>> [<c015fea0>] (xfs_bmap_finish+0x0/0x194) from [<c017e840>]
>> (xfs_itruncate_finish+0x1dc/0x30c)
>> [<c017e664>] (xfs_itruncate_finish+0x0/0x30c) from [<c0197dc8>]
>> (xfs_inactive+0x20c/0x40c)
>> [<c0197bbc>] (xfs_inactive+0x0/0x40c) from [<c01a3da0>]
>> (xfs_fs_clear_inode+0x50/0x60)
>>  r9:e3ec0000 r8:c001f128 r7:00000000 r6:e4671a80 r5:c0312454
>> r4:e4667300
>> [<c01a3d50>] (xfs_fs_clear_inode+0x0/0x60) from [<c00bdd84>]
>> (clear_inode+0x8c/0xe8)
>>  r4:e4667420 r3:c01a3d50
>> [<c00bdcf8>] (clear_inode+0x0/0xe8) from [<c00be584>]
>> (generic_delete_inode+0xdc/0x178)
>>  r4:e4667420 r3:ffffffff
>> [<c00be4a8>] (generic_delete_inode+0x0/0x178) from [<c00be640>]
>> (generic_drop_inode+0x20/0x68)
>>  r5:00000000 r4:e4667420
>> [<c00be620>] (generic_drop_inode+0x0/0x68) from [<c00bd368>] (iput+0x6c/0x7c)
>>  r4:e4667420 r3:c00be620
>> [<c00bd2fc>] (iput+0x0/0x7c) from [<c00b4c40>] (do_unlinkat+0xfc/0x154)
>>  r4:e4667420 r3:00000000
>> [<c00b4b44>] (do_unlinkat+0x0/0x154) from [<c00b4cb0>] (sys_unlink+0x18/0x1c)
>>  r7:0000000a r6:00000000 r5:00000000 r4:be90299b
>> [<c00b4c98>] (sys_unlink+0x0/0x1c) from [<c001ef80>] (ret_fast_syscall+0x0/0x30)
>> xfs_force_shutdown(sda1,0x8) called from line 4047 of file
>> fs/xfs/xfs_bmap.c.  Return address = 0xc015ffec
>> Filesystem "sda1": Corruption of in-memory data detected.  Shutting
>> down filesystem: sda1
>> Please umount the filesystem, and rectify the problem(s)
>>
>> [root@localhost amits]# xfs_repair -n /dev/sdb1
>> Phase 1 - find and verify superblock...
>> Phase 2 - using internal log
>>        - scan filesystem freespace and inode maps...
>>        - found root inode chunk
>> Phase 3 - for each AG...
>>        - scan (but don't clear) agi unlinked lists...
>>        - process known inodes and perform inode discovery...
>>        - agno = 0
>> data fork in ino 132 claims free block 115
>> data fork in ino 132 claims free block 116
>>        - agno = 1
>>        - agno = 2
>>        - agno = 3
>>        - process newly discovered inodes...
>> Phase 4 - check for duplicate blocks...
>>        - setting up duplicate extent list...
>>        - check for inodes claiming duplicate blocks...
>>        - agno = 0
>>        - agno = 2
>>        - agno = 1
>>        - agno = 3
>> No modify flag set, skipping phase 5
>> Phase 6 - check inode connectivity...
>>        - traversing filesystem ...
>>        - traversal finished ...
>>        - moving disconnected inodes to lost+found ...
>> Phase 7 - verify link counts...
>> No modify flag set, skipping filesystem flush and exiting.
>> [root@localhost amits]#
>>
>> Please find the logs for xfs_logprint at the time of issue attached.
>>
>> If there was really corruption which is shown at the time of deletion
>> of file then why did the XFS file-system mounted? After checking the
>> blocks request being passed as free request – it showed that the at
>> the time of xfs_free_ag_extent() – the values from the tree fetched
>> are not correct – for blocks to the right of current file extent (may
>> be due to corruption) – is there anything written to xfs logs related
>> with this? So that at the mount time this thing can be taken care.
>>
>> Please let me know in case more information is required for this.
>>
>> Thanks & Regards,
>> Amit Sahrawat
>>
>

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs