Hi Brian and Eric,
I have applied your kernel path v2(add unlink trace) to kernel-2.6.32-279.19.1.el6.x86_64.rpm in out test cluster and start testing again.2013/4/19 Brian Foster <bfoster@xxxxxxxxxx>
On 04/18/2013 11:23 AM, 符永涛 wrote:
> Hi Brian and Eric,
> The shutdown is not easy to produce but finally right now 2 of ourUnderstood. We've been trying very hard to reproduce ourselves to make
> servers in our test cluster xfs was shutdown.
>
it easier to debug, but haven't been able to reproduce at all so far.
This process allows us to make _some_ progress on the issue, even if it
is slower going than we'd like... ;)
Thanks again for the data. The racing behavior Eric called out (nice
> the trace output as following
> https://docs.google.com/file/d/0B7n2C4T5tfNCLXRYUWJ0b19JcWc/edit?usp=sharing
>
catch!) in his last mail lit up some light bulbs internally with regard
to some old locking issues triggered by XFS in the 6.3 kernel. The
following bug serves as an example:
https://bugzilla.redhat.com/show_bug.cgi?id=852847
... the fix for which went into the 2.6.32-279.19.1 6.3.z release. Could
you move some or all of your servers to this kernel[1] and see how it
goes? The best case is it resolves the problem, worst case we carry on
debugging from there...
Brian
[1] -
http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm
> 2013/4/18 Brian Foster <bfoster@xxxxxxxxxx <mailto:bfoster@xxxxxxxxxx>>
> Sorry but the systemtap is interrupt and I didn't noticed that so I
> didn't get systemtap logs.
>
> /var/log/message is same as before
> Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_iunlink_remove: xfs_inotobp()
> returned error 22.
> Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree returned
> error 22
> Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1) called
> from line 1184 of file fs/xfs/xfs_vnodeops.c. Return address =
> 0xffffffffa02d44aa
> Apr 18 22:43:14 10 kernel: XFS (sdb): I/O Error Detected. Shutting down
> filesystem
> Apr 18 22:43:14 10 kernel: XFS (sdb): Please umount the filesystem and
> rectify the problem(s)
> Apr 18 22:43:20 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.
>
> The metadump file is large I'll share it to you soon.
>
>
> <mailto:yongtaofu@xxxxxxxxx <mailto:yongtaofu@xxxxxxxxx>>>>
> On 04/18/2013 04:25 AM, 符永涛 wrote:
> > Hi Brian and Eric,
> > Can I change as following to bypass it?
>
> This is probably not a wise thing to do. The problem we're seeing here
> is indicative of a potentially larger problem than this particular error
> path. An inode is being unlinked and inactivated, but we aren't finding
> on the list where we expect it to be. Killing the error return doesn't
> eliminate the larger problem.
>
> So while changes could end up being made in this area as part of a fix,
> I would not suggest making any changes beyond those designed to help
> debug until we have a better idea of root cause.
>
> Brian
>
> > --- a/xfs_inode.c
> > +++ b/xfs_inode.c
> > @@ -1764,7 +1764,7 @@ xfs_iunlink_remove(
> > */
> > next_agino =
> be32_to_cpu(agi->agi_unlinked[bucket_index]);
> > last_ibp = NULL;
> > - while (next_agino != agino) {
> > + while (next_agino != agino && next_agino !=
> NULLAGINO) {
> > /*
> > * If the last inode wasn't the one
> pointing to
> > * us, then release its buffer since we're not
> > @@ -1786,6 +1786,14 @@ xfs_iunlink_remove(
> > ASSERT(next_agino != NULLAGINO);
> > ASSERT(next_agino != 0);
> > }
> > + if (next_agino == NULLAGINO) {
> > + /*
> > + *After search the list for the inode
> being free
> > + *we still can't find it.
> > + */
> > + xfs_err(mp, "%s ino %lld not found in unlinked
> > list.",
> > + __func__, (unsigned long
> > long)ip->i_ino);
> > + }
> > /*
> > * Now last_ibp points to the buffer previous to us on
> > * the unlinked list. Pull us from the list.
> > @@ -1810,16 +1818,20 @@ xfs_iunlink_remove(
> > } else {
> > xfs_trans_brelse(tp, ibp);
> > }
> > - /*
> > - * Point the previous inode on the list to the
> next inode.
> > - */
> > - last_dip->di_next_unlinked = cpu_to_be32(next_agino);
> > - ASSERT(next_agino != 0);
> > - offset = last_offset + offsetof(xfs_dinode_t,
> > di_next_unlinked);
> > - xfs_trans_inode_buf(tp, last_ibp);
> > - xfs_trans_log_buf(tp, last_ibp, offset,
> > - (offset + sizeof(xfs_agino_t) - 1));
> > - xfs_inobp_check(mp, last_ibp);
> > + if (next_agino != NULLAGINO) {
> > + /*
> > + * If only find the inode being free then
> we modify
> > + * the unlinked list.
> > + * Point the previous inode on the list to the
> > next inode.
> > + */
> > + last_dip->di_next_unlinked =
> > cpu_to_be32(next_agino);
> > + ASSERT(next_agino != 0);
> > + offset = last_offset + offsetof(xfs_dinode_t,
> > di_next_unlinked);
> > + xfs_trans_inode_buf(tp, last_ibp);
> > + xfs_trans_log_buf(tp, last_ibp, offset,
> > + (offset +
> sizeof(xfs_agino_t)
> > - 1));
> > + xfs_inobp_check(mp, last_ibp);
> > + }
> > }
> > return 0;
> > }
> >
> > Thank you.
> >
> >
> > 2013/4/18 符永涛 <yongtaofu@xxxxxxxxx <mailto:yongtaofu@xxxxxxxxx>
> >> <mailto:yongtaofu@xxxxxxxxx> <mailto:yongtaofu@xxxxxxxxx
> > Hi Brain and Eric,
> > If the problem is the agno can't be found in the unlinked
> list. Can
> > we just bypass it instead of passing ino=0xffffffff to
> xfs_inotobp?
> > Thank you.
> >
> >
> > 2013/4/18 符永涛 <yongtaofu@xxxxxxxxx
> > <mailto:sandeen@xxxxxxxxxxx <mailto:sandeen@xxxxxxxxxxx>>>> <mailto:yongtaofu@xxxxxxxxx>>>
> >
> > Hi Eric,
> > The shutdown issue is still not reproduced yet. But I get the
> > following error today during test.
> >
> > Apr 18 07:42:51 10 kernel: Call Trace:
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d91ef>] ?
> > xfs_buf_cond_lock+0x2f/0xc0 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffff814fe6a5>]
> > schedule_timeout+0x215/0x2e0
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d5f07>] ?
> > kmem_zone_alloc+0x77/0xf0 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffff814ff5c2>]
> __down+0x72/0xb0
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da652>] ?
> > _xfs_buf_find+0x102/0x280 [xfs]
> > Apr 18 07:42:51 10 kernel: "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> > Apr 18 07:42:51 10 kernel: glusterfsd D ffffffff8160b3c0
> > 0 14522 1 0x00000083
> > Apr 18 07:42:51 10 kernel: ffff882015a63a28 0000000000000082
> > 0000000000000000 0000000000000000
> > Apr 18 07:42:51 10 kernel: ffff882015a639b8 ffffffffa02d91ef
> > ffff882015a639d8 0000000000000246
> > Apr 18 07:42:51 10 kernel: ffff880e70491af8 ffff882015a63fd8
> > 000000000000fb88 ffff880e70491af8
> > Apr 18 07:42:51 10 kernel: Call Trace:
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d91ef>] ?
> > xfs_buf_cond_lock+0x2f/0xc0 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffff814fe6a5>]
> > schedule_timeout+0x215/0x2e0
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d5f07>] ?
> > kmem_zone_alloc+0x77/0xf0 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffff814ff5c2>]
> __down+0x72/0xb0
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da652>] ?
> > _xfs_buf_find+0x102/0x280 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffff81097ef1>] down+0x41/0x50
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da493>]
> > xfs_buf_lock+0x53/0x110 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da652>]
> > _xfs_buf_find+0x102/0x280 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da83b>]
> > xfs_buf_get+0x6b/0x1a0 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02daeac>]
> > xfs_buf_read+0x2c/0x100 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d0af8>]
> > xfs_trans_read_buf+0x1f8/0x400 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02b3444>]
> > xfs_read_agi+0x74/0x100 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02b967b>]
> > xfs_iunlink+0x5b/0x180 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffff810724c7>] ?
> > current_fs_time+0x27/0x30
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d12a7>] ?
> > xfs_trans_ichgtime+0x27/0xa0 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d15fb>]
> > xfs_droplink+0x5b/0x70 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d2f9e>]
> > xfs_remove+0x27e/0x3a0 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffff81186fd3>] ?
> > generic_permission+0x23/0xb0
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02e0968>]
> > xfs_vn_unlink+0x48/0x90 [xfs]
> > Apr 18 07:42:51 10 kernel: [<ffffffff81188c0f>]
> vfs_unlink+0x9f/0xe0
> > Apr 18 07:42:51 10 kernel: [<ffffffff8118795a>] ?
> > lookup_hash+0x3a/0x50
> > Apr 18 07:42:51 10 kernel: [<ffffffff8118b143>]
> > do_unlinkat+0x183/0x1c0
> > Apr 18 07:42:51 10 kernel: [<ffffffff81017938>] ?
> > syscall_trace_enter+0x1d8/0x1e0
> > Apr 18 07:42:51 10 kernel: [<ffffffff8118b196>]
> sys_unlink+0x16/0x20
> > Apr 18 07:42:51 10 kernel: [<ffffffff8100b308>]
> tracesys+0xd9/0xde
> >
> > Thank you.
> >
> >
> > 2013/4/17 Eric Sandeen <sandeen@xxxxxxxxxxx
> <mailto:sandeen@xxxxxxxxxxx>
> > <mailto:yongtaofu@xxxxxxxxx
> >> <mailto:yongtaofu@xxxxxxxxx> <mailto:yongtaofu@xxxxxxxxx>>> wrote:
> >
> >> Hi Brain,
> >> Can I change as following?
> >
> > ASSERTS are no-ops in a non-debug kernel, so this won't
> > change any behavior. I hope we'll know more if we get new
> > traces from your patched kernel....
> >
> > Eric
> >
> >> --- a/xfs_inode.c
> >> +++ b/xfs_inode.c
> >> @@ -1773,6 +1773,8 @@ xfs_iunlink_remove(
> >> if (last_ibp != NULL) {
> >> xfs_trans_brelse(tp,
> >> last_ibp);
> >> }
> >> + ASSERT(next_agino != NULLAGINO);
> >> + ASSERT(next_agino != 0);
> >> next_ino = XFS_AGINO_TO_INO(mp,
> >> agno, next_agino);
> >> error = xfs_inotobp(mp, tp,
> >> next_ino, &last_dip,
> >> &last_ibp,
> >> &last_offset, 0);
> >> @@ -1783,8 +1785,6 @@ xfs_iunlink_remove(
> >> return error;
> >> }
> >> next_agino =
> >> be32_to_cpu(last_dip->di_next_unlinked);
> >> - ASSERT(next_agino != NULLAGINO);
> >> - ASSERT(next_agino != 0);
> >> }
> >> /*
> >> * Now last_ibp points to the buffer
> >> previous to us on
> >>
> >> Thank you.
> >>
> >>
> >> 2013/4/17 符永涛 <yongtaofu@xxxxxxxxx
> <mailto:yongtaofu@xxxxxxxxx>
> >> <mailto:yongtaofu@xxxxxxxxx> <mailto:yongtaofu@xxxxxxxxx>>>
> >>
> >> Hi Brain,
> >> If it is because NULLAGINO is passed in to
> xfs_inotobp().
> >> Can I move the following two lines before
> xfs_inotobp?
> >>
> >> For example:
> >>
> >> 1767 while (next_agino != agino) {
> >> 1768 /*
> >> 1769 * If the last inode
> >> wasn't the one pointing to
> >> 1770 * us, then release its
> >> buffer since we're not
> >> 1771 * going to do anything
> >> with it.
> >> 1772 */
> >> 1773 if (last_ibp != NULL) {
> >> 1774
> >> xfs_trans_brelse(tp, last_ibp);
> >> 1775 }
> >> 1776 next_ino =
> >> XFS_AGINO_TO_INO(mp, agno, next_agino);
> >> + ASSERT(next_agino !=
> >> NULLAGINO);
> >> + ASSERT(next_agino
> != 0);
> >> 1777 error = xfs_inotobp(mp,
> >> tp, next_ino, &last_dip,
> >> 1778
> >> &last_ibp, &last_offset, 0);
> >> 1779 if (error) {
> >> 1780 xfs_warn(mp,
> >> 1781 "%s:
> >> xfs_inotobp() returned error %d.",
> >> 1782
> __func__,
> >> error);
> >> 1783 return error;
> >> 1784 }
> >> 1785 next_agino =
> >> be32_to_cpu(last_dip->di_next_unlinked);
> >> -
> //ASSERT(next_agino !=
> >> NULLAGINO);
> >> -
> //ASSERT(next_agino != 0);
> >> 1788 }
> >> I don't understand xfs well and correct me if I'm
> >> totally wrong.
> >> Thank you very much.
> >>
> >>
> >> 2013/4/17 符永涛 <yongtaofu@xxxxxxxxx
> <mailto:yongtaofu@xxxxxxxxx>
> <mailto:yongtaofu@xxxxxxxxx>>>> >> <mailto:yongtaofu@xxxxxxxxx
> >>
> >> Hi Brain,
> >> I want to ask a question, according to the
> >> shutdown trace. The ino in xfs_iunlink_remove
> >> is 0x113, why xfs_imap got ino=0xffffffff ?
> >>
> >> --- xfs_imap --
> >>
> module("xfs").function("xfs_imap@fs/xfs/xfs_ialloc.c:1257").return
> >> -- return=0x16
> >> vars: mp=0xffff882017a50800 tp=0xffff881c81797c70
> >> ino=0xffffffff
> >>
> >> --- xfs_iunlink_remove --
> >>
> module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return
> >> -- return=0x16
> >> vars: tp=0xffff881c81797c70 ip=0xffff881003c13c00
> >> next_ino=? mp=? agi=? dip=?
> >> agibp=0xffff880109b47e20 ibp=? agno=? agino=?
> >> next_agino=? last_ibp=?
> >> last_dip=0xffff882000000000 bucket_index=?
> >> offset=? last_offset=0xffffffffffff8810 error=?
> >> __func__=[...]
> >> ip: i_ino = 0x113, i_flags = 0x0
> >>
> >> Thank you.
> >>
> >>
> >>
> >> 2013/4/17 符永涛 <yongtaofu@xxxxxxxxx
> <mailto:yongtaofu@xxxxxxxxx>
> <mailto:yongtaofu@xxxxxxxxx>>>> >> <mailto:bfoster@xxxxxxxxxx
> >>
> >> Hi Brain,
> >> Thank you for your update, and I have applied
> >> your last kernel patch. However it is not
> easy
> >> to reproduce especially in out test
> >> environment. Till now is not happens again.
> >> I'll update the kernel patch now. BTW is
> there
> >> any findings in the logs of previous thread?
> >>
> http://oss.sgi.com/archives/xfs/2013-04/msg00327.html
> >> I guess it tend to happen during glusterfs
> >> rebalance because glusterfs moves a lot of
> >> file from one server to another and then
> >> unlink it.
> >>
> >> Thank you.
> >>
> >>
> >> 2013/4/17 Brian Foster
> <bfoster@xxxxxxxxxx <mailto:bfoster@xxxxxxxxxx>
> >> <mailto:bfoster@xxxxxxxxxx> <mailto:bfoster@xxxxxxxxxx>>>
> >>
> >> On 04/16/2013 12:24 PM, Dave Chinner
> wrote:
> >> > On Mon, Apr 15, 2013 at 07:14:39PM
> >> -0400, Brian Foster wrote:
> >> >> Hi,
> >> >>
> >> >> Thanks for the data in the
> previous thread:
> >> >>
> >> >>
> >>
> http://oss.sgi.com/archives/xfs/2013-04/msg00327.html
> >> >>
> >> ...
> >> >>
> >> >> echo 1 >
> >>
> /sys/kernel/debug/tracing/events/xfs/xfs_iunlink/enable
> >> >> echo 1 >
> >>
> /sys/kernel/debug/tracing/events/xfs/xfs_iunlink_remove/enable
> >> >> ... reproduce ...
> >> >> cat
> >> /sys/kernel/debug/tracing/trace >
> trace.output
> >> >
> >> > It's better to use trace-cmd for this.
> >> it will result in less
> >> > dropped events. i.e.:
> >> >
> >> > $ trace-cmd record -e
> xfs_iunlink\*
> >> > ... reproduce ...
> >> > ^C
> >> > $ trace-cmd report > trace.output
> >> >
> >> >> --- a/fs/xfs/linux-2.6/xfs_trace.h
> >> >> +++ b/fs/xfs/linux-2.6/xfs_trace.h
> >> >> @@ -581,6 +581,8 @@
> >> DEFINE_INODE_EVENT(xfs_file_fsync);
> >> ...
> >> >
> >> > I would suggest that the the tracing
> >> shoul dbe at entry of the
> >> > function, otherwise we won't get a
> >> tracepoint for the operation that
> >> > triggers the shutdown. (That's the
> >> reason most tracepoints in XFS
> >> > are at function entry...)
> >> >
> >>
> >> Good points, thanks Dave. A v2 that pulls
> >> up the tracepoints towards
> >> function entry is appended.
> >>
> >> Brian
> >>
> >> From
> >> 280943e78ebe0b97a774cba51e7815c42f044b55
> >> Mon Sep 17 00:00:00 2001
> >> From: Brian Foster
> <bfoster@xxxxxxxxxx <mailto:bfoster@xxxxxxxxxx>
> <mailto:xfs@xxxxxxxxxxx <mailto:xfs@xxxxxxxxxxx>>> <mailto:bfoster@xxxxxxxxxx>>>
> >> Date: Mon, 15 Apr 2013 18:16:24 -0400
> >> Subject: [PATCH v2] xfs: add tracepoints
> >> for xfs_iunlink and
> >> xfs_iunlink_remove
> >>
> >> ---
> >> fs/xfs/linux-2.6/xfs_trace.h | 2 ++
> >> fs/xfs/xfs_inode.c | 4 ++++
> >> 2 files changed, 6 insertions(+), 0
> >> deletions(-)
> >>
> >> diff --git a/fs/xfs/linux-2.6/xfs_trace.h
> >> b/fs/xfs/linux-2.6/xfs_trace.h
> >> index adc6ec4..338a0f9 100644
> >> --- a/fs/xfs/linux-2.6/xfs_trace.h
> >> +++ b/fs/xfs/linux-2.6/xfs_trace.h
> >> @@ -583,6 +583,8 @@
> >> DEFINE_INODE_EVENT(xfs_file_fsync);
> >> DEFINE_INODE_EVENT(xfs_destroy_inode);
> >> DEFINE_INODE_EVENT(xfs_dirty_inode);
> >> DEFINE_INODE_EVENT(xfs_clear_inode);
> >> +DEFINE_INODE_EVENT(xfs_iunlink);
> >> +DEFINE_INODE_EVENT(xfs_iunlink_remove);
> >>
> >> DEFINE_INODE_EVENT(xfs_dquot_dqalloc);
> >> DEFINE_INODE_EVENT(xfs_dquot_dqdetach);
> >> diff --git a/fs/xfs/xfs_inode.c
> >> b/fs/xfs/xfs_inode.c
> >> index 19900f0..d705c77 100644
> >> --- a/fs/xfs/xfs_inode.c
> >> +++ b/fs/xfs/xfs_inode.c
> >> @@ -1615,6 +1615,8 @@ xfs_iunlink(
> >>
> >> mp = tp->t_mountp;
> >>
> >> + trace_xfs_iunlink(ip);
> >> +
> >> /*
> >> * Get the agi buffer first. It
> >> ensures lock ordering
> >> * on the list.
> >> @@ -1694,6 +1696,8 @@ xfs_iunlink_remove(
> >> mp = tp->t_mountp;
> >> agno = XFS_INO_TO_AGNO(mp,
> ip->i_ino);
> >>
> >> + trace_xfs_iunlink_remove(ip);
> >> +
> >> /*
> >> * Get the agi buffer first. It
> >> ensures lock ordering
> >> * on the list.
> >> --
> >> 1.7.7.6
> >>
> >>
> >>
> >>
> >> --
> >> 符永涛
> >>
> >>
> >>
> >>
> >> --
> >> 符永涛
> >>
> >>
> >>
> >>
> >> --
> >> 符永涛
> >>
> >>
> >>
> >>
> >> --
> >> 符永涛
> >> _______________________________________________
> >> xfs mailing list
> >> xfs@xxxxxxxxxxx <mailto:xfs@xxxxxxxxxxx>
> >> http://oss.sgi.com/mailman/listinfo/xfs
> >
> >
> >
> >
> > --
> > 符永涛
> >
> >
> >
> >
> > --
> > 符永涛
> >
> >
> >
> >
> > --
> > 符永涛
>
>
>
>
> --
> 符永涛
--
符永涛
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs