cl030: CMAN: node cl031a is not responding - removing from the cluster dlm: closing connection to node 1 dlm: closing connection to node 2 SM: 00000001 sm_stop: SG still joined SM: 01000932 sm_stop: SG still joined SM: 02000933 sm_stop: SG still joined Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: c0119677 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: lock_nolock lock_dlm dlm qla2200 qla2xxx gfs lock_harness cman dm_mod CPU: 1 EIP: 0060:[<c0119677>] Not tainted VLI EFLAGS: 00010096 (2.6.9) EIP is at task_rq_lock+0x27/0x70 eax: ea000f2c ebx: c052e000 ecx: 00000000 edx: 00000000 esi: c0533020 edi: c052e000 ebp: ea000ef4 esp: ea000ee8 ds: 007b es: 007b ss: 0068 Process cman_comms (pid: 3739, threadinfo=ea000000 task=e9de98f0) Stack: c1b037b0 e617daf4 f781243c ea000f3c c0119d92 00000000 ea000f2c c014a82f c181f040 00000020 f7810750 f8d04755 ffffff95 02000933 00000000 ea000f50 f8d047b5 00000296 c1b037b0 e617daf4 f781243c ea000f50 c011a02e 00000000 Call Trace: [<c010626f>] show_stack+0x7f/0xa0 [<c010641e>] show_registers+0x15e/0x1d0 [<c010663e>] die+0xfe/0x190 [<c0118683>] do_page_fault+0x293/0x7c1 [<c0105e59>] error_code+0x2d/0x38 [<c0119d92>] try_to_wake_up+0x22/0x2a0 [<c011a02e>] wake_up_process+0x1e/0x30 [<f8d048b0>] callback_startdone_barrier+0x20/0x30 [cman] [<f8cfc641>] node_shutdown+0x291/0x3c0 [cman] [<f8cf847a>] cluster_kthread+0x2aa/0x350 [cman] [<c0103325>] kernel_thread_helper+0x5/0x10 SM: 00000001 sm_stop: SG still joined SM: 01000932 sm_stop: SG still joined SM: 02000933 sm_stop: SG still joined Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: c0119677 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP cl031: dlm: closing connection to node 1 dlm: closing connection to node 2 SM: 00000001 sm_stop: SG still joined SM: 01000932 sm_stop: SG still joined SM: 02000933 sm_stop: SG still joined Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: c0119677 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: lock_dlm dlm qla2200 qla2xxx gfs lock_harness cman dm_mod CPU: 1 EIP: 0060:[<c0119677>] Not tainted VLI EFLAGS: 00010096 (2.6.9) EIP is at task_rq_lock+0x27/0x70 eax: eacc0f2c ebx: c052e000 ecx: 00000000 edx: 00000000 esi: c0533020 edi: c052e000 ebp: eacc0ef4 esp: eacc0ee8 ds: 007b es: 007b ss: 0068 Process cman_comms (pid: 2876, threadinfo=eacc0000 task=eae75a30) Stack: ea3a285c caeb7da4 f502e1a8 eacc0f3c c0119d92 00000000 eacc0f2c c014a82f c181f040 00000020 f7d35f38 f8d04755 ffffff95 02000933 00000000 eacc0f50 f8d047b5 00000296 ea3a285c caeb7da4 f502e1a8 eacc0f50 c011a02e 00000000 Call Trace: [<c010626f>] show_stack+0x7f/0xa0 [<c010641e>] show_registers+0x15e/0x1d0 [<c010663e>] die+0xfe/0x190 [<c0118683>] do_page_fault+0x293/0x7c1 [<c0105e59>] error_code+0x2d/0x38 [<c0119d92>] try_to_wake_up+0x22/0x2a0 [<c011a02e>] wake_up_process+0x1e/0x30 [<f8d04880>] callback_startdone_barrier_new+0x20/0x30 [cman] [<f8cfc641>] node_shutdown+0x291/0x3c0 [cman] [<f8cf847a>] cluster_kthread+0x2aa/0x350 [cman] [<c0103325>] kernel_thread_helper+0x5/0x10 Code: 00 00 00 00 55 89 e5 83 ec 0c 89 1c 24 89 74 24 04 89 7c 24 08 8b 45 0c 9c 8f 00 fa be 20 30 53 c0 bb 00 e0 52 c0 8b 55 08 89 df <8b> 42 04 8b 40 10 8b 0c cl032: Dec 22 05:29:19 cl032 sshd(pam_unix)[18296]: session closed for user root Dec 22 05:42:14 cl032 kernel: CMAN: bad generation number 15 in HELLO message, expected 14 Dec 22 05:42:17 cl032 kernel: CMAN: Node cl030a is leaving the cluster, ShutdownDec 22 05:42:17 cl032 kernel: CMAN: quorum lost, blocking activity My test is doing a lot of mounting and umounting. Wouldn't that stress SM code a lot. Is SM causing the problem? http://developer.osdl.org/daniel/GFS/cman.21dec2004/ Daniel