Re: gfs2 blocking tasks

Steven Whitehouse <swhiteho@xxxxxxxxxx> · Wed, 22 Aug 2012 15:04:56 +0100

Hi,

On Wed, 2012-08-22 at 14:34 +0200, Bart Verwilst wrote:
> Hi Steven,
> 
> I have just tested this kernel:
> 
> http://people.canonical.com/~apw/lp1020207-precise/
> 
> Which contains 2 patches, 
> 0001-GFS2-More-automated-code-analysis-fixes.patch and 
> 0002-GFS2-Read-in-rindex-if-necessary-during-unlink.patch. This will 
> probably be included into the latest stable kernel of Ubuntu 12.04.
> 
> Works perfectly now, can't reproduce this bug anymore.
> 
> What's next to push this to -stable?
> 
> Bart
> 

Looking at kernel.org, 3.2.28 is the one to aim at, since that is the
current stable tree for 3.2

http://www.kernel.org/

So it should just be a case of posting a patch against that tree to
stable@xxxxxxxxxx along with a detailed explanation (i.e. mention that
this is a targeted back port of an upstream patch and also the bug which
it fixes). If you copy me in, then I can ACK it,

Steve.

> Steven Whitehouse schreef op 22.08.2012 10:44:
> > Hi,
> >
> > On Wed, 2012-08-22 at 09:35 +0200, Bart Verwilst wrote:
> >> Hi Steven,
> >>
> >> I'm not sure if this is enough to fix it in 3.2:
> >>
> >> --- inode.c.orig	2012-08-22 07:28:15.675859475 +0000
> >> +++ inode.c	2012-08-22 07:33:05.895865014 +0000
> >> @@ -1039,6 +1039,10 @@
> >>   	struct gfs2_rgrpd *rgd;
> >>   	int error;
> >>
> >> +	error = gfs2_rindex_update(sdp);
> >> +	if (error)
> >> +		return error;
> >> +
> >>   	gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, ghs);
> >>   	gfs2_holder_init(ip->i_gl,  LM_ST_EXCLUSIVE, 0, ghs + 1);
> >>
> >> I've left off the "if (!rgd) { ... }" part since out_inodes doesn't
> >> exist yet there, so it seemed unneeded. I'm far from a kernel 
> >> developer
> >> though, so please give your OK for this, so i can try to push it 
> >> into
> >> Ubuntu and/or upstream.
> >>
> >> Bart
> >>
> > It may well be enough. If we can verify that, then I'm happy to ACK 
> > this
> > for -stable,
> >
> > Steve.
> >
> >> Bart Verwilst schreef op 21.08.2012 22:35:
> >> > Hi Steven,
> >> >
> >> > I've tested with kernel 3.3 (
> >> > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-precise/ ), bug
> >> > isnt present there. Tried with 3.2.28 ( also from the kernel-ppa 
> >> ),
> >> > and bug happens there. In the end i was idd able to trace it back 
> >> to
> >> > 3.3-rc6, where you pushed a couple of GFS2 patches upstream.
> >> >
> >> > inode.c for example is quite different between 3.2.28 and 3.3-rc1,
> >> > and i do not dare to hack myself a diff file that incorporates 
> >> your
> >> > change, fearing it will probably be less stable than it is 
> >> already.
> >> > :)
> >> >
> >> > Would it be too much to ask to backport your change to 3.2.x? I 
> >> will
> >> > then test this, and try to push it upstream to -stable and/or 
> >> ubuntu
> >> > LTS..
> >> >
> >> > Thanks a lot in advance!
> >> >
> >> > Kind regards,
> >> >
> >> > Bart
> >> >
> >> > Steven Whitehouse schreef op 21.08.2012 14:37:
> >> >> Hi,
> >> >>
> >> >> On Tue, 2012-08-21 at 14:23 +0200, Bart Verwilst wrote:
> >> >>> Hi Steven,
> >> >>>
> >> >>> The kernel of Ubuntu 12.04 LTS is based on 3.2.0, while the 
> >> patch
> >> >>> you
> >> >>> mentioned seems to be for a newer(?) version.
> >> >>> What should I do, offer an altered version of this patch for
> >> >>> inclusion
> >> >>> into Ubuntu's 3.2.0 kernel, or is it a little less 
> >> straightforward
> >> >>> than
> >> >>> this? :)
> >> >>>
> >> >>> Kind regards,
> >> >>>
> >> >>> Bart
> >> >>>
> >> >> Well, since you've not got that patch in your existing kernel, 
> >> then
> >> >> there are really two issues here. Firstly to try and verify that
> >> >> this
> >> >> patch really is a fix for the problem, and then to figure out 
> >> what
> >> >> needs
> >> >> to be done wrt Ubuntu distro kernels. One solution may be to post 
> >> it
> >> >> for
> >> >> the upstream -stable kernel as I think most distros will then 
> >> pick
> >> >> this
> >> >> up.
> >> >>
> >> >> Are you able to build a new Ubuntu kernel with that patch in it?
> >> >> That
> >> >> would be one way to test it. Another way which doesn't require
> >> >> building
> >> >> kernels is this:
> >> >>
> >> >> The problem occurs when the resource groups are not uptodate, and
> >> >> various actions taken on the filesystem will ensure that they are
> >> >> uptodate. Mounting a filesystem and immediately running an unlink 
> >> of
> >> >> a
> >> >> file which is known to exist on the filesystem (before performing
> >> >> any
> >> >> other action) should trigger that bug if it is present.
> >> >>
> >> >> It may not be that particular bug, but that looks the most likely 
> >> of
> >> >> any
> >> >> recent patch to that bit of code. Are you able to try a more 
> >> recent
> >> >> Ubuntu kernel? If you could try one based on a more recent 
> >> upstream
> >> >> which has that patch in it, then that might also help narrow down
> >> >> the
> >> >> problem,
> >> >>
> >> >> Steve.
> >> >>
> >> >>> Steven Whitehouse schreef op 21.08.2012 13:27:
> >> >>> > Hi,
> >> >>> >
> >> >>> > On Tue, 2012-08-21 at 13:03 +0200, Bart Verwilst wrote:
> >> >>> >> Hi Steven,
> >> >>> >>
> >> >>> >> There is no drbd in the mix ( which is why i changed the 
> >> title
> >> >>> of
> >> >>> >> the
> >> >>> >> bugreport now ). I'm only using plain iSCSI. The original 
> >> posted
> >> >>> had
> >> >>> >> it
> >> >>> >> with drbd :)
> >> >>> >>
> >> >>> >> Kind regards,
> >> >>> >>
> >> >>> >> Bart
> >> >>> >>
> >> >>> > Ah, I see sorry.. I misunderstood the report. I wonder whether
> >> >>> your
> >> >>> > distro kernel has this patch:
> >> >>> >
> >> >>> >
> >> >>> 
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=718b97bd6b03445be53098e3c8f896aeebc304aa
> >> >>> >
> >> >>> > Thats the most likely thing that I can see that has been fixed
> >> >>> > recently,
> >> >>> >
> >> >>> > Steve.
> >> >>> >
> >> >>> >
> >> >>> >> Steven Whitehouse schreef op 21.08.2012 12:59:
> >> >>> >> > Hi,
> >> >>> >> >
> >> >>> >> > On Tue, 2012-08-21 at 12:39 +0200, Bart Verwilst wrote:
> >> >>> >> >> Hi Steven
> >> >>> >> >>
> >> >>> >> >> Shared storage is iSCSI,
> >> >>> >> >>
> >> >>> >> >> <totem rrp_mode="none" secauth="off" token="20000"/>
> >> >>> >> >> <quorumd tko="4" interval="2"
> >> >>> >> >> device="/dev/mapper/iscsi_cluster_quorum"></quorumd>
> >> >>> >> >>
> >> >>> >> >> Actually i know why this is happening now, and can 
> >> reproduce
> >> >>> 100%
> >> >>> >> of
> >> >>> >> >> the time, i've added my findings as a comment to this bug
> >> >>> from
> >> >>> >> >> somebody
> >> >>> >> >> having the same problem:
> >> >>> >> >>
> >> >>> >> >>
> >> >>> 
> >> https://bugs.launchpad.net/ubuntu/+source/gfs2-utils/+bug/1020207
> >> >>> >> >>
> >> >>> >> >> Create file on one node's gfs2 mount, rm on the other ->
> >> >>> hanging
> >> >>> >> >> mountpoint + kernel OOPS.
> >> >>> >> >>
> >> >>> >> >> Happy that i'm finally getting somewhere with this :P
> >> >>> >> >>
> >> >>> >> >> Anything i can do to help Steven?
> >> >>> >> >>
> >> >>> >> >> Kind regards,
> >> >>> >> >>
> >> >>> >> >> Bart Verwilst
> >> >>> >> >>
> >> >>> >> > Can you reproduce this without drbd in the mix? That should
> >> >>> remove
> >> >>> >> > one
> >> >>> >> > complication and make this easier to track down.
> >> >>> >> >
> >> >>> >> > I'll take a look at see what that dereference is likely to 
> >> be
> >> >>> in
> >> >>> >> the
> >> >>> >> > mean time,
> >> >>> >> >
> >> >>> >> > Steve.
> >> >>> >> >
> >> >>> >> >> Steven Whitehouse schreef op 21.08.2012 12:17:
> >> >>> >> >> > Hi,
> >> >>> >> >> >
> >> >>> >> >> > On Tue, 2012-08-21 at 12:08 +0200, Bart Verwilst wrote:
> >> >>> >> >> >> As yet another reply to my own post, i found this on 
> >> the
> >> >>> node
> >> >>> >> >> where
> >> >>> >> >> >> it
> >> >>> >> >> >> hangs ( this time it's vm01, and 
> >> /var/lib/libvirt/sanlock
> >> >>> >> that's
> >> >>> >> >> >> hanging
> >> >>> >> >> >> ):
> >> >>> >> >> >>
> >> >>> >> >> >>
> >> >>> >> >> >> [ 1219.640653] GFS2: fsid=: Trying to join cluster
> >> >>> "lock_dlm",
> >> >>> >> >> >> "kvm:sanlock"
> >> >>> >> >> >> [ 1219.660035] GFS2: fsid=kvm:sanlock.0: Joined 
> >> cluster.
> >> >>> Now
> >> >>> >> >> >> mounting
> >> >>> >> >> >> FS...
> >> >>> >> >> >> [ 1219.720108] GFS2: fsid=kvm:sanlock.0: jid=0, already
> >> >>> locked
> >> >>> >> >> for
> >> >>> >> >> >> use
> >> >>> >> >> >> [ 1219.720113] GFS2: fsid=kvm:sanlock.0: jid=0: Looking 
> >> at
> >> >>> >> >> >> journal...
> >> >>> >> >> >> [ 1219.772606] GFS2: fsid=kvm:sanlock.0: jid=0: 
> >> Acquiring
> >> >>> the
> >> >>> >> >> >> transaction lock...
> >> >>> >> >> >> [ 1219.772659] GFS2: fsid=kvm:sanlock.0: jid=0: 
> >> Replaying
> >> >>> >> >> journal...
> >> >>> >> >> >> [ 1219.772675] GFS2: fsid=kvm:sanlock.0: jid=0: 
> >> Replayed 0
> >> >>> of
> >> >>> >> 0
> >> >>> >> >> >> blocks
> >> >>> >> >> >> [ 1219.772679] GFS2: fsid=kvm:sanlock.0: jid=0: Found 1
> >> >>> revoke
> >> >>> >> >> tags
> >> >>> >> >> >> [ 1219.782611] init: libvirt-bin main process ended,
> >> >>> >> respawning
> >> >>> >> >> >> [ 1219.784161] GFS2: fsid=kvm:sanlock.0: jid=0: Journal
> >> >>> >> replayed
> >> >>> >> >> in
> >> >>> >> >> >> 1s
> >> >>> >> >> >> [ 1219.784268] GFS2: fsid=kvm:sanlock.0: jid=0: Done
> >> >>> >> >> >> [ 1219.784329] GFS2: fsid=kvm:sanlock.0: jid=1: Trying 
> >> to
> >> >>> >> acquire
> >> >>> >> >> >> journal lock...
> >> >>> >> >> >> [ 1219.788349] GFS2: fsid=kvm:sanlock.0: jid=1: Looking 
> >> at
> >> >>> >> >> >> journal...
> >> >>> >> >> >> [ 1219.886047] GFS2: fsid=kvm:sanlock.0: jid=1: Done
> >> >>> >> >> >> [ 1219.886110] GFS2: fsid=kvm:sanlock.0: jid=2: Trying 
> >> to
> >> >>> >> acquire
> >> >>> >> >> >> journal lock...
> >> >>> >> >> >> [ 1219.891121] GFS2: fsid=kvm:sanlock.0: jid=2: Looking 
> >> at
> >> >>> >> >> >> journal...
> >> >>> >> >> >> [ 1219.943994] init: ttyS1 main process (20318) 
> >> terminated
> >> >>> >> with
> >> >>> >> >> >> status
> >> >>> >> >> >> 1
> >> >>> >> >> >> [ 1219.944037] init: ttyS1 main process ended, 
> >> respawning
> >> >>> >> >> >> [ 1219.967054] init: ttyS0 main process (20320) 
> >> terminated
> >> >>> >> with
> >> >>> >> >> >> status
> >> >>> >> >> >> 1
> >> >>> >> >> >> [ 1219.967100] init: ttyS0 main process ended, 
> >> respawning
> >> >>> >> >> >> [ 1219.972037] ttyS0: LSR safety check engaged!
> >> >>> >> >> >> [ 1220.226027] GFS2: fsid=kvm:sanlock.0: jid=2: 
> >> Acquiring
> >> >>> the
> >> >>> >> >> >> transaction lock...
> >> >>> >> >> >> [ 1220.226160] GFS2: fsid=kvm:sanlock.0: jid=2: 
> >> Replaying
> >> >>> >> >> journal...
> >> >>> >> >> >> [ 1220.311801] GFS2: fsid=kvm:sanlock.0: jid=2: 
> >> Replayed 3
> >> >>> of
> >> >>> >> 106
> >> >>> >> >> >> blocks
> >> >>> >> >> >> [ 1220.311805] GFS2: fsid=kvm:sanlock.0: jid=2: Found 
> >> 4487
> >> >>> >> revoke
> >> >>> >> >> >> tags
> >> >>> >> >> >> [ 1220.322148] GFS2: fsid=kvm:sanlock.0: jid=2: Journal
> >> >>> >> replayed
> >> >>> >> >> in
> >> >>> >> >> >> 1s
> >> >>> >> >> >> [ 1220.322253] GFS2: fsid=kvm:sanlock.0: jid=2: Done
> >> >>> >> >> >
> >> >>> >> >> > So that looks like it successfully recovered the 
> >> journals
> >> >>> for
> >> >>> >> >> nodes
> >> >>> >> >> > one
> >> >>> >> >> > and two. How many nodes are in the cluster? What is the
> >> >>> fencing
> >> >>> >> >> > quorum
> >> >>> >> >> > set up being used?
> >> >>> >> >> >
> >> >>> >> >> >> [ 1221.457120] BUG: unable to handle kernel NULL 
> >> pointer
> >> >>> >> >> dereference
> >> >>> >> >> >> at
> >> >>> >> >> >> 0000000000000018
> >> >>> >> >> >
> >> >>> >> >> > So this is a dereference of something which is 24 bytes
> >> >>> into
> >> >>> >> some
> >> >>> >> >> > structure or other. Certainly something which should not
> >> >>> happen
> >> >>> >> so
> >> >>> >> >> we
> >> >>> >> >> > need to take a look at that.
> >> >>> >> >> >
> >> >>> >> >> > Was this a one off, or something that you can reproduce?
> >> >>> >> >> >
> >> >>> >> >> > Steve.
> >> >>> >> >> >
> >> >>> >> >> >
> >> >>> >> >> >> [ 1221.457508] IP: [<ffffffffa04f800a>]
> >> >>> gfs2_unlink+0x8a/0x220
> >> >>> >> >> >> [gfs2]
> >> >>> >> >> >> [ 1221.457958] PGD 2170f2067 PUD 20c0ff067 PMD 0
> >> >>> >> >> >> [ 1221.458197] Oops: 0000 [#1] SMP
> >> >>> >> >> >> [ 1221.458374] CPU 0
> >> >>> >> >> >> [ 1221.458460] Modules linked in: gfs2 dlm ipmi_si 
> >> mptctl
> >> >>> >> mptbase
> >> >>> >> >> >> ipmi_devintf ipmi_msghandler dell_rbu ip6table_filter
> >> >>> >> ip6_tables
> >> >>> >> >> >> iptable_filter ip_tables ebtable_nat ebtables x_tables
> >> >>> >> kvm_intel
> >> >>> >> >> kvm
> >> >>> >> >> >> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm 
> >> ocfs2_nodemanager
> >> >>> >> >> >> ocfs2_stackglue
> >> >>> >> >> >> configfs dm_round_robin ib_iser bridge rdma_cm ib_cm 
> >> iw_cm
> >> >>> >> ib_sa
> >> >>> >> >> >> ib_mad
> >> >>> >> >> >> ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
> >> >>> >> >> scsi_transport_iscsi
> >> >>> >> >> >> bonding nfsd nfs lockd fscache auth_rpcgss nfs_acl 
> >> sunrpc
> >> >>> >> 8021q
> >> >>> >> >> garp
> >> >>> >> >> >> stp
> >> >>> >> >> >> joydev dm_multipath dcdbas mac_hid i7core_edac 
> >> edac_core
> >> >>> >> >> >> acpi_power_meter lp parport usbhid hid bnx2 mpt2sas
> >> >>> >> >> >> scsi_transport_sas
> >> >>> >> >> >> e1000e raid_class scsi_dh_rdac usb_storage [last 
> >> unloaded:
> >> >>> >> >> ipmi_si]
> >> >>> >> >> >> [ 1221.463058]
> >> >>> >> >> >> [ 1221.463146] Pid: 20611, comm: libvirtd Not tainted
> >> >>> >> >> >> 3.2.0-26-generic
> >> >>> >> >> >> #41-Ubuntu Dell Inc. PowerEdge R310/05XKKK
> >> >>> >> >> >> [ 1221.463626] RIP: 0010:[<ffffffffa04f800a>]
> >> >>> >> >> [<ffffffffa04f800a>]
> >> >>> >> >> >> gfs2_unlink+0x8a/0x220 [gfs2]
> >> >>> >> >> >> [ 1221.464100] RSP: 0018:ffff88020cfe1d28  EFLAGS:
> >> >>> 00010296
> >> >>> >> >> >> [ 1221.464344] RAX: 0000000000000000 RBX: 
> >> ffff88021ef58000
> >> >>> >> RCX:
> >> >>> >> >> >> ffff88020cfe1d40
> >> >>> >> >> >> [ 1221.464662] RDX: 0000000000000000 RSI: 
> >> 00000000000183f3
> >> >>> >> RDI:
> >> >>> >> >> >> ffff88022efa2440
> >> >>> >> >> >> [ 1221.464979] RBP: ffff88020cfe1e38 R08: 
> >> 4000000000000000
> >> >>> >> R09:
> >> >>> >> >> >> 0000000000000000
> >> >>> >> >> >> [ 1221.465297] R10: fdd50265775f720a R11: 
> >> 0000000000000003
> >> >>> >> R12:
> >> >>> >> >> >> ffff88021ef50000
> >> >>> >> >> >> [ 1221.465615] R13: ffff88020cfe1d80 R14: 
> >> ffff8802178980c0
> >> >>> >> R15:
> >> >>> >> >> >> ffff88022efa2000
> >> >>> >> >> >> [ 1221.466115] FS:  00007f4d2c0e7700(0000)
> >> >>> >> >> GS:ffff880237200000(0000)
> >> >>> >> >> >> knlGS:0000000000000000
> >> >>> >> >> >> [ 1221.466487] CS:  0010 DS: 0000 ES: 0000 CR0:
> >> >>> >> 0000000080050033
> >> >>> >> >> >> [ 1221.466748] CR2: 0000000000000018 CR3: 
> >> 00000002175e4000
> >> >>> >> CR4:
> >> >>> >> >> >> 00000000000006f0
> >> >>> >> >> >> [ 1221.467066] DR0: 0000000000000000 DR1: 
> >> 0000000000000000
> >> >>> >> DR2:
> >> >>> >> >> >> 0000000000000000
> >> >>> >> >> >> [ 1221.467384] DR3: 0000000000000000 DR6: 
> >> 00000000ffff0ff0
> >> >>> >> DR7:
> >> >>> >> >> >> 0000000000000400
> >> >>> >> >> >> [ 1221.467702] Process libvirtd (pid: 20611, threadinfo
> >> >>> >> >> >> ffff88020cfe0000, task ffff8802241cdbc0)
> >> >>> >> >> >> [ 1221.468091] Stack:
> >> >>> >> >> >> [ 1221.468199]  0000000000000003 ffff88022f108048
> >> >>> >> >> ffff88020cfe1d58
> >> >>> >> >> >> ffff88020cfe1d40
> >> >>> >> >> >> [ 1221.468581]  ffff88020cfe1d40 ffff88022f108000
> >> >>> >> >> ffff88022354aa00
> >> >>> >> >> >> 0000000000000001
> >> >>> >> >> >> [ 1221.468963]  0000000000000000 0000000000000000
> >> >>> >> >> ffffffffa04f7fda
> >> >>> >> >> >> ffff88020cfe1d80
> >> >>> >> >> >> [ 1221.469346] Call Trace:
> >> >>> >> >> >> [ 1221.469486]  [<ffffffffa04f7fda>] ?
> >> >>> gfs2_unlink+0x5a/0x220
> >> >>> >> >> [gfs2]
> >> >>> >> >> >> [ 1221.469955]  [<ffffffffa04f7ff4>] ?
> >> >>> gfs2_unlink+0x74/0x220
> >> >>> >> >> [gfs2]
> >> >>> >> >> >> [ 1221.470236]  [<ffffffff8129cb2c>] ?
> >> >>> >> >> >> security_inode_permission+0x1c/0x30
> >> >>> >> >> >> [ 1221.470536]  [<ffffffff81184e70>]
> >> >>> >> vfs_unlink.part.26+0x80/0xf0
> >> >>> >> >> >> [ 1221.470802]  [<ffffffff81184f1c>] 
> >> vfs_unlink+0x3c/0x60
> >> >>> >> >> >> [ 1221.471040]  [<ffffffff8118758a>]
> >> >>> do_unlinkat+0x1aa/0x1d0
> >> >>> >> >> >> [ 1221.471290]  [<ffffffff81177fc0>] ?
> >> >>> vfs_write+0x110/0x180
> >> >>> >> >> >> [ 1221.471538]  [<ffffffff811782a7>] ? 
> >> sys_write+0x67/0x90
> >> >>> >> >> >> [ 1221.471780]  [<ffffffff81188106>] 
> >> sys_unlink+0x16/0x20
> >> >>> >> >> >> [ 1221.472019]  [<ffffffff81661fc2>]
> >> >>> >> >> system_call_fastpath+0x16/0x1b
> >> >>> >> >> >> [ 1221.472290] Code: 00 00 49 83 c5 40 31 d2 4c 89 e9 
> >> be
> >> >>> 01 00
> >> >>> >> 00
> >> >>> >> >> 00
> >> >>> >> >> >> e8
> >> >>> >> >> >> fc 1e ff ff 48 8b b3 28 02 00 00 4c 89 ff e8 ad 7e 00 
> >> 00
> >> >>> 48 8d
> >> >>> >> 8d
> >> >>> >> >> 08
> >> >>> >> >> >> ff
> >> >>> >> >> >> ff ff <48> 8b 78 18 31 d2 be 01 00 00 00 48 83 e9 80 e8 
> >> d2
> >> >>> 1e
> >> >>> >> ff
> >> >>> >> >> ff
> >> >>> >> >> >> 48
> >> >>> >> >> >> [ 1221.473936] RIP  [<ffffffffa04f800a>]
> >> >>> >> gfs2_unlink+0x8a/0x220
> >> >>> >> >> >> [gfs2]
> >> >>> >> >> >> [ 1221.474240]  RSP <ffff88020cfe1d28>
> >> >>> >> >> >> [ 1221.474408] CR2: 0000000000000018
> >> >>> >> >> >> [ 1221.474959] ---[ end trace f7df780fd45600a8 ]---
> >> >>> >> >> >>
> >> >>> >> >> >>
> >> >>> >> >> >> Bart Verwilst schreef op 20.08.2012 09:50:
> >> >>> >> >> >> > Nothing out of the ordinary, should have mentioned 
> >> that!
> >> >>> >> >> >> >
> >> >>> >> >> >> > <snip>
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> read
> >> >>> >> >> >> 20
> >> >>> >> >> >> > bytes from fd 17
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> >> client
> >> >>> >> >> >> > command is 7
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> About
> >> >>> >> >> >> to
> >> >>> >> >> >> > process command
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> memb:
> >> >>> >> >> command
> >> >>> >> >> >> to
> >> >>> >> >> >> > process is 7
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> memb:
> >> >>> >> >> >> > get_all_members: allocated new buffer (retsize=1024)
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> memb:
> >> >>> >> >> >> > get_all_members: retlen = 1760
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> memb:
> >> >>> >> >> command
> >> >>> >> >> >> > return code is 4
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> >> > Returning command data. length = 1760
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> >> sending
> >> >>> >> >> >> > reply 40000007 to fd 17
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> read
> >> >>> >> >> >> 20
> >> >>> >> >> >> > bytes from fd 17
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> >> client
> >> >>> >> >> >> > command is 800000b7
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> About
> >> >>> >> >> >> to
> >> >>> >> >> >> > process command
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> memb:
> >> >>> >> >> command
> >> >>> >> >> >> to
> >> >>> >> >> >> > process is 800000b7
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> memb:
> >> >>> >> >> command
> >> >>> >> >> >> > return code is 0
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> >> > Returning command data. length = 0
> >> >>> >> >> >> > Aug 19 00:08:00 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> >> sending
> >> >>> >> >> >> > reply c00000b7 to fd 17
> >> >>> >> >> >> > Aug 19 00:08:01 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> read
> >> >>> >> >> >> 20
> >> >>> >> >> >> > bytes from fd 17
> >> >>> >> >> >> > Aug 19 00:08:01 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> >> client
> >> >>> >> >> >> > command is 7
> >> >>> >> >> >> > Aug 19 00:08:01 vm02-test corosync[7394]:   [CMAN  ]
> >> >>> daemon:
> >> >>> >> >> About
> >> >>> >> >> >> to
> >> >>> >> >> >> > process command
> >> >>> >> >> >> > </snip>
> >> >>> >> >> >> >
> >> >>> >> >> >> > Digimer schreef op 20.08.2012 00:01:
> >> >>> >> >> >> >> On 08/19/2012 05:34 PM, Bart Verwilst wrote:
> >> >>> >> >> >> >>> Aug 19 00:10:01 vm02-test kernel: [282120.240067]
> >> >>> INFO:
> >> >>> >> task
> >> >>> >> >> >> >>> kworker/1:0:3117 blocked for more than 120 seconds.
> >> >>> >> >> >> >>> Aug 19 00:10:01 vm02-test kernel: [282120.240182]
> >> >>> "echo
> >> >>> 0
> >> >>> >> >
> >> >>> >> >> >> >>> /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> this
> >> >>> >> >> message.
> >> >>> >> >> >> >>
> >> >>> >> >> >> >> Nothing around Aug 19 00:08:00 ?
> >> >>> >> >> >>
> >> >>> >> >> >> --
> >> >>> >> >> >> Linux-cluster mailing list
> >> >>> >> >> >> Linux-cluster@xxxxxxxxxx
> >> >>> >> >> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >>
> 

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster