Dear Eric and Dave,
The xfs shutdown seems go away however one of our server report the following error it make glusterfsd hang again. Is this just related to high load? Or the same issue with different behavior after change the vfs.Apr 24 12:35:07 10 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 24 12:37:07 10 kernel: INFO: task glusterfsd:5835 blocked for more than 120 seconds.
Apr 24 12:37:07 10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 12:37:07 10 kernel: glusterfsd D 0000000000000003 0 5835 1 0x00000080
Apr 24 12:37:07 10 kernel: ffff88100ed77a28 0000000000000082 0000000000000000 ffff8818e843cdd8
Apr 24 12:37:07 10 kernel: ffff8810177c1bc0 ffff8818e8422ea0 0000000000004004 ffff882019453000
Apr 24 12:37:07 10 kernel: ffff88101609b098 ffff88100ed77fd8 000000000000fb88 ffff88101609b098
Apr 24 12:37:07 10 kernel: Call Trace:
Apr 24 12:37:07 10 kernel: [<ffffffff814eaad5>] schedule_timeout+0x215/0x2e0
Apr 24 12:37:07 10 kernel: [<ffffffffa02a4978>] ? xfs_da_do_buf+0x618/0x770 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff814eb9f2>] __down+0x72/0xb0
Apr 24 12:37:07 10 kernel: [<ffffffffa02daae2>] ? _xfs_buf_find+0x102/0x280 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff810967f1>] down+0x41/0x50
Apr 24 12:37:07 10 kernel: [<ffffffffa02da923>] xfs_buf_lock+0x53/0x110 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02daae2>] _xfs_buf_find+0x102/0x280 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02daccb>] xfs_buf_get+0x6b/0x1a0 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02db33c>] xfs_buf_read+0x2c/0x100 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02d0f88>] xfs_trans_read_buf+0x1f8/0x400 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02b3774>] xfs_read_agi+0x74/0x100 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02b999b>] xfs_iunlink+0x4b/0x170 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff81070f97>] ? current_fs_time+0x27/0x30
Apr 24 12:37:07 10 kernel: [<ffffffffa02d1737>] ? xfs_trans_ichgtime+0x27/0xa0 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02d1a8b>] xfs_droplink+0x5b/0x70 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02d342e>] xfs_remove+0x27e/0x3a0 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff8118215c>] ? generic_permission+0x5c/0xb0
Apr 24 12:37:07 10 kernel: [<ffffffffa02e0da8>] xfs_vn_unlink+0x48/0x90 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff81183d6f>] vfs_unlink+0x9f/0xe0
Apr 24 12:37:07 10 kernel: [<ffffffff81182aaa>] ? lookup_hash+0x3a/0x50
Apr 24 12:37:07 10 kernel: [<ffffffff811862a3>] do_unlinkat+0x183/0x1c0
Apr 24 12:37:07 10 kernel: [<ffffffff8117b876>] ? sys_newstat+0x36/0x50
Apr 24 12:37:07 10 kernel: [<ffffffff811862f6>] sys_unlink+0x16/0x20
Apr 24 12:37:07 10 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
.
2675 mutex_lock(&inode->i_mutex);
2676 /* Make sure we don't allow creating hardlink to an unlinked file */
2677 if (inode->i_nlink == 0)
2678 error = -ENOENT;
2679 else
2680 vfs_dq_init(dir);
2681 error = dir->i_op->link(old_dentry, dir, new_dentry);
2682 mutex_unlock(&inode->i_mutex);
2013/4/24 Dave Chinner <david@xxxxxxxxxxxxx>
Perhaps, but that may have changed the timing sufficiently to makeOn Mon, Apr 22, 2013 at 07:52:51PM -0500, Eric Sandeen wrote:
> On 4/22/13 7:08 PM, Dave Chinner wrote:
> > On Mon, Apr 22, 2013 at 02:59:54PM -0500, Eric Sandeen wrote:
> >> On 4/15/13 6:14 PM, Brian Foster wrote:
> >>> Hi,
> >>>
> >>> Thanks for the data in the previous thread:
> >>>
> >>> http://oss.sgi.com/archives/xfs/2013-04/msg00327.html
> >>>
> >>> I'm spinning off a new thread specifically for this because the original
> >>> thread is already too large and scattered to track. As Eric stated,
> >>> please try to keep data contained in as few messages as possible.
> >>>
> >>
> >> Well, it's always simple in the end. It just took a lot of debugging
> >> to figure out what was happening - we do appreciate your help with that!
> >>
> >> We were able to create a local reproducer, and it looks like
> >> this patch fixes things:
> >>
> >> commit aae8a97d3ec30788790d1720b71d76fd8eb44b73
> >> Author: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
> >> Date: Sat Jan 29 18:43:27 2011 +0530
> >>
> >> fs: Don't allow to create hardlink for deleted file
> >
> > Good find Eric - great work on the reproducer script.
> >
> > FWIW, can you confirm that a debug kernel assert fails
> > with a non-zero link count in xfs_bumplink() with your test case?
> >
> > int
> > xfs_bumplink(
> > xfs_trans_t *tp,
> > xfs_inode_t *ip)
> > {
> > xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> >
> >>>>>> ASSERT(ip->i_d.di_nlink > 0);
>
> Yep, it does, I put a printk in there when I was testing
> and it fired.
>
> Guess we should have tested a debug xfs right off the bat ;)
the race go away. What we really needed was a way to just turn the
assert into a WARN_ON() without all the other debug code like we've
previously talked about. So, rather than talk about it again, I
posted patches to do this....
Exactly.
> > ip->i_d.di_nlink++;
> > inc_nlink(VFS_I(ip));
> >
> > If it does, we should consider this a in-memory corruption case and
> > return and trigger a shutdown here....
>
> I suppose that makes sense, it'd be a much less cryptic failure for
> something that will fail soon anyway.
--
符永涛
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs