On Tue, Oct 31, 2017 at 8:05 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Tue, Oct 31, 2017 at 06:51:08PM -0700, Cong Wang wrote: >> On Mon, Oct 30, 2017 at 5:33 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> > On Mon, Oct 30, 2017 at 02:55:43PM -0700, Cong Wang wrote: >> >> Hello, >> >> >> >> We triggered a list corruption (double add) warning below on our 4.9 >> >> kernel (the 4.9 kernel we use is based on -stable release, with only a >> >> few unrelated networking backports): > ... >> >> 4.9.34.el7.x86_64 #1 >> >> Hardware name: TYAN S5512/S5512, BIOS V8.B13 03/20/2014 >> >> ffffb0d48a0abb30 ffffffff8e389f47 ffffb0d48a0abb80 0000000000000000 >> >> ffffb0d48a0abb70 ffffffff8e08989b 0000002400000000 ffff8d9d691e0aa0 >> >> ffff8d9d7a716608 ffff8d9d691e0aa0 0000000000004000 ffff8d9d7de6d800 >> >> Call Trace: >> >> [<ffffffff8e389f47>] dump_stack+0x4d/0x66 >> >> [<ffffffff8e08989b>] __warn+0xcb/0xf0 >> >> [<ffffffff8e08991f>] warn_slowpath_fmt+0x5f/0x80 >> >> [<ffffffff8e3a979c>] __list_add+0xac/0xb0 >> >> [<ffffffff8e2355bb>] inode_sb_list_add+0x3b/0x50 >> >> [<ffffffffc040157c>] xfs_setup_inode+0x2c/0x170 [xfs] >> >> [<ffffffffc0402097>] xfs_ialloc+0x317/0x5c0 [xfs] >> >> [<ffffffffc0404347>] xfs_dir_ialloc+0x77/0x220 [xfs] >> > >> > Inode allocation, so should be a new inode straight from the slab >> > cache. THat implies memory corruption of some kind. Please turn on >> > slab poisoning and try to reproduce. >> >> Are you sure? xfs_iget() seems searching in a cache before allocating >> a new one: > > /me sighs > > You started with "I don't know the XFS code very well", so I omitted > the complexity of describing about 10 different corner cases where > we /could/ find the unlinked inode still in the cache via the > lookup. But they aren't common cases - the common case in the real > world is allocation of cache cold inodes. IOWs: "so should be a new > inode straight from the slab cache". > > So, yes, we could find the old unlinked inode still cached in the > XFS inode cache, but I don't have the time to explain how RCU lookup > code works to everyone who reports a bug. Oh, sorry about it. I understand it now. > > All you need to understand is that all of this happens below the VFS > and so inodes being reclaimed or newly allocated the in-cache inode > should never, ever be on the VFS sb inode list. > OK. >> >> [<ffffffff8e74cf32>] ? down_write+0x12/0x40 >> >> [<ffffffffc0404972>] xfs_create+0x482/0x760 [xfs] >> >> [<ffffffffc04019ae>] xfs_generic_create+0x21e/0x2c0 [xfs] >> >> [<ffffffffc0401a84>] xfs_vn_mknod+0x14/0x20 [xfs] >> >> [<ffffffffc0401aa6>] xfs_vn_mkdir+0x16/0x20 [xfs] >> >> [<ffffffff8e226698>] vfs_mkdir+0xe8/0x140 >> >> [<ffffffff8e22aa4a>] SyS_mkdir+0x7a/0xf0 >> >> [<ffffffff8e74f8e0>] entry_SYSCALL_64_fastpath+0x13/0x94 >> >> >> >> _Without_ looking deeper, it seems this warning could be shut up by: >> >> >> >> --- a/fs/xfs/xfs_icache.c >> >> +++ b/fs/xfs/xfs_icache.c >> >> @@ -1138,6 +1138,8 @@ xfs_reclaim_inode( >> >> xfs_iunlock(ip, XFS_ILOCK_EXCL); >> >> >> >> XFS_STATS_INC(ip->i_mount, xs_ig_reclaims); >> >> + >> >> + inode_sb_list_del(VFS_I(ip)); >> >> >> >> with properly exporting inode_sb_list_del(). Does this make any sense? >> > >> > No, because by this stage the inode has already been removed from >> > the superblock indoe list. Doing this sort of thing here would just >> > paper over whatever the underlying problem might be. >> >> >> For me, it looks like the inode in the cache pag->pag_ici_root >> is not removed from sb list before removing from cache. > > Sure, we have list corruption. Where we detect that corruption > implies nothing about the cause of the list corruption. The two > events are not connected in any way. Clearing that VFS list here > does nothing to fix the problem causing the list corruption to > occur. OK. > >> >> Please let me know if I can provide any other information. >> > >> > How do you reproduce the problem? >> >> The warning is reported via ABRT email, we don't know what was >> happening at the time of crash. > > Which makes it even harder to track down. Perhaps you should > configure the box to crashdump on such a failure and then we > can do some post-failure forensic analysis... Yeah. We are trying to make kdump working, but even if kdump works we still can't turn on panic_on_warn since this is production machine. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html