Metadata corruption at xfs_attr3_leaf_write_verify()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

>From time to time we hit an xfs corruption when xattr leaf is attempted to
be written when it has ichdr.count==0.

We are on an old kernel 3.18.19. We managed to capture xfs trace buffer
closer to the problem time.

This is the corruption report
	Jul 20 05:22:27.189727 node1 kernel: [466290.765299] XFS (dm-78): Metadata
corruption detected at xfs_attr3_leaf_write_verify+0xe5/0x100 [xfs], block
0x9698734d8
	Jul 20 05:22:27.189741 node1 kernel: [466290.767279] XFS (dm-78): Unmount
and run xfs_repair
	Jul 20 05:22:27.189743 node1 kernel: [466290.768022] XFS (dm-78): First 64
bytes of corrupted metadata buffer:
	Jul 20 05:22:27.191635 node1 kernel: [466290.768832] ffff88002b4bf000: 00
00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
	Jul 20 05:22:27.192949 node1 kernel: [466290.770163] ffff88002b4bf010: 10
00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
	Jul 20 05:22:27.192953 node1 kernel: [466290.771264] ffff88002b4bf020: 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
	Jul 20 05:22:27.195921 node1 kernel: [466290.772815] ffff88002b4bf030: 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

>From xfs xattr debug trace, it appears that a file is created within a
directory having default ACL set. This causes file creation to insert xattr
leaf during file creation. This completes queuing an AIL. Immediately after
this an explicit xfs_setattr is being done on the same file which alters the
default ACL & this causes default ACL to be removed/added. It looks like
this 2nd set attr is somehow influencing the first AIL entry that it is
being seen with ichr.count==0 during the write.

I can pass the full xfs debug trace if required. Here is the snippet of
debug trace causing the issue

           <...>-18953 [004] ...1 466290.749358: xfs_attr_sf_create: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749359: xfs_attr_sf_addname: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749361: xfs_attr_sf_lookup: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749362: xfs_attr_sf_to_leaf: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749430: xfs_attr_leaf_create: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749454: xfs_attr_leaf_addname: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749456: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749457: xfs_attr_leaf_add: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749458: xfs_attr_leaf_add_work: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749649: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.749908: xfs_attr_leaf_get: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 1024 hashval
0xea9061c8 op_flags
           <...>-18953 [004] ...1 466290.749910: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 1024 hashval
0xea9061c8 op_flags
           <...>-18953 [004] ...1 466290.749918: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.749931: xfs_attr_leaf_get: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 1024 hashval
0x487716dd op_flags
           <...>-18953 [004] ...1 466290.749932: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 1024 hashval
0x487716dd op_flags
           <...>-18953 [004] ...1 466290.749984: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.749990: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750063: xfs_setattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750257: xfs_attr_leaf_addname: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750260: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750261: xfs_attr_leaf_replace: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750262: xfs_attr_leaf_add: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags RENAME|ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750262: xfs_attr_leaf_add_work: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags RENAME|ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750270: xfs_attr_leaf_flipflags:
dev 253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100
hashval 0x275b19c4 op_flags RENAME|ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750275: xfs_attr_leaf_remove: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags RENAME|ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750461: xfs_attr_leaf_removename:
dev 253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 0 hashval
0xea9061c8 op_flags OKNOENT
           <...>-18953 [004] ...1 466290.750463: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 0 hashval 0xea9061c8
op_flags OKNOENT
           <...>-18953 [004] ...1 466290.750474: xfs_attr_leaf_addname: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 49 hashval
0x487716dd op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750475: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 49 hashval
0x487716dd op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750476: xfs_attr_leaf_add: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 49 hashval
0x487716dd op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750476: xfs_attr_leaf_add_work: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 49 hashval
0x487716dd op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750645: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750648: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750645: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750648: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750659: xfs_attr_leaf_get: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 1024 hashval
0x487716dd op_flags
           <...>-18953 [004] ...1 466290.750661: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 1024 hashval
0x487716dd op_flags
           <...>-18953 [004] ...1 466290.750744: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750791: xfs_attr_leaf_addname: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 444 hashval 0xea9061c8
op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750793: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 444 hashval 0xea9061c8
op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750794: xfs_attr_leaf_add: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 444 hashval 0xea9061c8
op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750794: xfs_attr_leaf_add_work: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 444 hashval 0xea9061c8
op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.751029: xfs_attr_leaf_get: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 1024 hashval
0xea9061c8 op_flags
           <...>-18953 [004] ...1 466290.751031: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 1024 hashval
0xea9061c8 op_flags


Any pointers/input to check this further is greatly appreciated! Thanks.

--Shyam

On Tue, Apr 26, 2016 at 4:10 PM, Shyam Kaushik <shyam@xxxxxxxxxxxxxxxxx>
wrote:
Hi Eric,

Yes this is a non-CRC FS. The workload is we a custom application that
does several (16 threads) file create/read/writes/update xattrs on a XFS
mounted over raw disk.

There is no messages prior to this & we hit this once in a few days (like
every 4-5 days). I will try your suggestion of running with xfs_attr_*
tracepoints all the time so that when we hit this error we have better
info. Please let me know if you prefer to add further debug prints.
Thanks.

--Shyam

> Eric Sandeen sandeen at sandeen.net wrote:
> On Mon Apr 25 16:57:11 CDT 2016
>
> Is this a non-crc filesystem?
>
>
> XFS_ATTR_LEAF_MAGIC is ok (if it's a non-crc filesystem)
>
> Looks the same as the other report, tripping on:
>
>        if (ichdr.count == 0)
>                return false;
>
> A reproducer would be super here.  At least maybe a description of the
> workload that hits it?
>
> Count is manipulated in things like attr leaf compaction...
> Any other messages prior to this?
>
> How often do you hit it?
>
> You could also turn on xfs_attr_* tracepoints, maybe.
>
> -Eric

-----Original Message-----
From: Shyam Kaushik [mailto:shyam@xxxxxxxxxxxxxxxxx]
Sent: 25 April 2016 22:22
To: 'xfs@xxxxxxxxxxx'
Cc: Alex Lyakas
Subject: Metadata corruption detected at
xfs_attr3_leaf_write_verify+0xe5/0x100

Hi Dave et al,

We are periodically hitting the below metadata corruption with XFS over a
raw disk running several file copies with xattr operations on kernel
3.18.19. Unmounting & running xfs_repair doesn't report any corruption. I
see that this was last reported here
http://oss.sgi.com/archives/xfs/2015-12/msg00224.html

Unfortunately we dont have a reproducer, but this issue happens
periodically. We can add more debug prints & allow this issue to happen
again. Can you pls suggest any options to debug this further? Thanks

Apr 20 21:58:03 node1 kernel: [16736.286370] XFS (dm-26): Metadata
corruption detected at xfs_attr3_leaf_write_verify+0xe5/0x100 [xfs], block
0x19c5c728
Apr 20 21:58:03 node1 kernel: [16736.289084] XFS (dm-26): Unmount and run
xfs_repair
Apr 20 21:58:03 node1 kernel: [16736.290257] XFS (dm-26): First 64 bytes
of corrupted metadata buffer:
Apr 20 21:58:03 node1 kernel: [16736.291797] ffff880123668000: 00 00 00 00
00 00 00 00 fb ee 00 00 00 00 00 00  ................
Apr 20 21:58:03 node1 kernel: [16736.293823] ffff880123668010: 10 00 00 00
00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Apr 20 21:58:03 node1 kernel: [16736.297504] ffff880123668020: 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 20 21:58:03 node1 kernel: [16736.299343] ffff880123668030: 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 20 21:58:03 node1 kernel: [16736.301465] XFS (dm-26):
xfs_do_force_shutdown(0x8) called from line 1244 of file fs/xfs/xfs_buf.c.
Return address = 0xffffffffc095cee0
Apr 20 21:58:03 node1 kernel: [16736.301469] ------------[ cut here
]------------
Apr 20 21:58:03 node1 kernel: [16736.301551] XFS(dm-26): SHUTDOWN!!!
old_flags=0x0 new_flags=0x8
Apr 20 21:58:03 node1 kernel: [16736.301703] CPU: 1 PID: 7857 Comm:
xfsaild/dm-26 Tainted: G           OE  3.18.19 #1
Apr 20 21:58:03 node1 kernel: [16736.301705] Hardware name: Bochs Bochs,
BIOS Bochs 01/01/2011
Apr 20 21:58:03 node1 kernel: [16736.301707]  0000000000000009
ffff88020c5ffb38 ffffffff81710c85 0000000000000000
Apr 20 21:58:03 node1 kernel: [16736.301711]  ffff88020c5ffb88
ffff88020c5ffb78 ffffffff81072df1 2e2e202030302030
Apr 20 21:58:03 node1 kernel: [16736.301715]  0000000000000000
0000000000000008 ffff88020c127000 0000000000000000
Apr 20 21:58:03 node1 kernel: [16736.301718] Call Trace:
Apr 20 21:58:03 node1 kernel: [16736.301769]  [<ffffffff81710c85>]
dump_stack+0x4e/0x71
Apr 20 21:58:03 node1 kernel: [16736.301780]  [<ffffffff81072df1>]
warn_slowpath_common+0x81/0xa0
Apr 20 21:58:03 node1 kernel: [16736.301784]  [<ffffffff81072e56>]
warn_slowpath_fmt+0x46/0x50
Apr 20 21:58:03 node1 kernel: [16736.301860]  [<ffffffffc09693f3>]
xfs_do_force_shutdown+0x33/0x170 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.301921]  [<ffffffffc095cee0>] ?
_xfs_buf_ioapply+0xa0/0x430 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.301951]  [<ffffffffc095ee4b>] ?
__xfs_buf_delwri_submit+0x22b/0x290 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302066]  [<ffffffffc095cee0>]
_xfs_buf_ioapply+0xa0/0x430 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302137]  [<ffffffff8109e260>] ?
wake_up_state+0x20/0x20
Apr 20 21:58:03 node1 kernel: [16736.302162]  [<ffffffffc095ee4b>] ?
__xfs_buf_delwri_submit+0x22b/0x290 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302183]  [<ffffffffc095ea78>]
xfs_buf_submit+0x68/0x210 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302241]  [<ffffffffc095ee4b>]
__xfs_buf_delwri_submit+0x22b/0x290 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302267]  [<ffffffffc095fc60>] ?
xfs_buf_delwri_submit_nowait+0x20/0x30 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302291]  [<ffffffffc098f440>] ?
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302308]  [<ffffffffc095fc60>]
xfs_buf_delwri_submit_nowait+0x20/0x30 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302333]  [<ffffffffc098f66b>]
xfsaild+0x22b/0x630 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302513]  [<ffffffffc098f440>] ?
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302518]  [<ffffffff810911b9>]
kthread+0xc9/0xe0
Apr 20 21:58:03 node1 kernel: [16736.302522]  [<ffffffff810910f0>] ?
kthread_create_on_node+0x180/0x180
Apr 20 21:58:03 node1 kernel: [16736.302530]  [<ffffffff81717918>]
ret_from_fork+0x58/0x90
Apr 20 21:58:03 node1 kernel: [16736.302549]  [<ffffffff810910f0>] ?
kthread_create_on_node+0x180/0x180
Apr 20 21:58:03 node1 kernel: [16736.302551] ---[ end trace
0bb81b88fdd6a298 ]---

--Shyam
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux