David Teigland wrote:
On Wed, Jun 21, 2006 at 03:41:58PM -0300, German Staltari wrote:
Jun 21 14:59:17 qmail-be-04 kernel: CMAN: removing node qmail-be-02 from
the cluster : Missed too many heartbeats
Jun 21 14:59:23 qmail-be-04 kernel: CMAN: removing node qmail-be-01 from
the cluster : No response to messages
Jun 21 14:59:29 qmail-be-04 kernel: CMAN: removing node qmail-be-06 from
the cluster : No response to messages
Jun 21 14:59:39 qmail-be-04 kernel: CMAN: removing node qmail-be-03 from
the cluster : No response to messages
Jun 21 14:59:46 qmail-be-04 kernel: CMAN: removing node qmail-be-05 from
the cluster : No response to messages
Jun 21 14:59:52 qmail-be-04 kernel: CMAN: quorum lost, blocking activity
Jun 21 14:59:52 qmail-be-04 kernel: CMAN: node qmail-be-04 has been
removed from the cluster : No response to messages
Jun 21 14:59:52 qmail-be-04 kernel: CMAN: killed by NODEDOWN message
Jun 21 14:59:52 qmail-be-04 kernel: CMAN: we are leaving the cluster. No
response to messages
This is what led to the gfs panic, the cluster shut down when it lost
contact with all the other nodes.
Dave
Ok, but this node lost contact with the cluster because all the other
nodes get the same panic at the same time.
We had another panic a few minutes ago... 3rd panic today... the same
logs output...
Jun 21 16:13:55 qmail-be-01 kernel: lock_dlm: Assertion failed on line
357 of file /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c
Jun 21 16:13:55 qmail-be-01 kernel: lock_dlm: assertion: "!error"
Jun 21 16:13:55 qmail-be-01 kernel: lock_dlm: time = 951351
Jun 21 16:13:55 qmail-be-01 kernel: mstore008-004: error=-22
num=2,75c6db lkf=10000 flags=84
Jun 21 16:13:55 qmail-be-01 kernel:
Jun 21 16:13:55 qmail-be-01 kernel: ------------[ cut here ]------------
Jun 21 16:13:55 qmail-be-01 kernel: kernel BUG at
/soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c:357!
Jun 21 16:13:55 qmail-be-01 kernel: invalid opcode: 0000 [#1]
Jun 21 16:13:55 qmail-be-01 kernel: SMP
Jun 21 16:13:55 qmail-be-01 kernel: Modules linked in: nfsd exportfs
lockd nfs_acl sunrpc gfs lock_dlm lock_harness dlm cman dm_round_robin
dm_multipath ipv6 ohci_hcd i2c_piix4 i2c_core e1000 sg ext3 jbd dm_mod
qla2300 qla2xxx scsi_transport_fc mptspi mptscsih mptbase sd_mod scsi_mod
Jun 21 16:13:55 qmail-be-01 kernel: CPU: 6
Jun 21 16:13:55 qmail-be-01 kernel: EIP: 0060:[<f90254d8>]
Tainted: GF VLI
Jun 21 16:13:55 qmail-be-01 kernel: EFLAGS: 00010296 (2.6.16.11-gds #1)
Jun 21 16:13:55 qmail-be-01 kernel: EIP is at do_dlm_unlock+0xd1/0xe5
[lock_dlm]
Jun 21 16:13:55 qmail-be-01 kernel: eax: 00000004 ebx: 00000084 ecx:
ffffebd8 edx: 00000000
Jun 21 16:13:55 qmail-be-01 kernel: esi: 00010000 edi: ffffffea ebp:
ca4265c0 esp: d741eef4
Jun 21 16:13:56 qmail-be-01 kernel: ds: 007b es: 007b ss: 0068
Jun 21 16:13:56 qmail-be-01 kernel: Process gfs_glockd (pid: 1061,
threadinfo=d741e000 task=d6b40550)
Jun 21 16:13:56 qmail-be-01 kernel: Stack: <0>f902b673 f53267e0 ffffffea
00000002 0075c6db 00000000 00010000 00000084
Jun 21 16:13:56 qmail-be-01 kernel: 00000002 f9732000 00000003
ca4265c0 cec8a4ac f902552e f905e6b5 cec8a4dc
Jun 21 16:13:56 qmail-be-01 kernel: cec8a4c8 cec8a4dc f9055f02
00000296 000000d0 f9732000 f9089ee0 c539f9c0
Jun 21 16:13:56 qmail-be-01 kernel: Call Trace:
Jun 21 16:13:56 qmail-be-01 kernel: [<f902552e>]
lm_dlm_unlock+0x14/0x1c [lock_dlm]
Jun 21 16:13:56 qmail-be-01 kernel: [<f905e6b5>]
gfs_lm_unlock+0x2c/0x47 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel: [<f9055f02>]
gfs_glock_drop_th+0x84/0x182 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel: [<f9054817>] run_queue+0x348/0x374
[gfs]
Jun 21 16:13:56 qmail-be-01 kernel: [<f90541a4>]
handle_callback+0xe6/0x120 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel: [<f905485e>]
unlock_on_glock+0x1b/0x24 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel: [<f905441b>]
gfs_reclaim_glock+0xbc/0x170 [gfs]
Jun 21 16:13:56 qmail-be-01 kernel: [<c031db3e>] _spin_lock_irqsave+0x9/0xd
Jun 21 16:13:56 qmail-be-01 kernel: [<f9047bca>] gfs_glockd+0xda/0xff [gfs]
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster