Someone accidentally messed up iptables on a node so that it could no longer communicate with the cluster. That should have been the end of the problem, one node down but instead, all nodes died with a kernel crash. Here is a paste from one of the log's. I think this is the right section which shows the dying nodes; Mike Apr 22 11:55:26 qm250 kernel: qm move flags 0,1,0 ids 0,3,0 Apr 22 11:55:26 qm250 kernel: qm move use event 3 Apr 22 11:55:26 qm250 kernel: qm recover event 3 (first) Apr 22 11:55:26 qm250 kernel: qm add nodes Apr 22 11:55:26 qm250 kernel: qm total nodes 2 Apr 22 11:55:26 qm250 kernel: qm rebuild resource directory Apr 22 11:55:26 qm250 kernel: qm rebuilt 8 resources Apr 22 11:55:26 qm250 kernel: qm recover event 3 done Apr 22 11:55:26 qm250 kernel: qm move flags 0,0,1 ids 0,3,3 Apr 22 11:55:26 qm250 kernel: qm process held requests Apr 22 11:55:26 qm250 kernel: qm processed 0 requests Apr 22 11:55:26 qm250 kernel: qm recover event 3 finished Apr 22 11:55:26 qm250 kernel: clvmd move flags 1,0,0 ids 2,2,2 Apr 22 11:55:26 qm250 kernel: qm move flags 1,0,0 ids 3,3,3 Apr 22 11:55:26 qm250 kernel: 2640 pr_start last_stop 0 last_start 4 last_finish 0 Apr 22 11:55:26 qm250 kernel: 2640 pr_start count 2 type 2 event 4 flags 250 Apr 22 11:55:26 qm250 kernel: 2640 claim_jid 1 Apr 22 11:55:26 qm250 kernel: 2640 pr_start 4 done 1 Apr 22 11:55:26 qm250 kernel: 2640 pr_finish flags 5a Apr 22 11:55:27 qm250 kernel: 2566 recovery_done jid 1 msg 309 a Apr 22 11:55:27 qm250 kernel: 2566 recovery_done nodeid 250 flg 18 Apr 22 11:55:27 qm250 kernel: Apr 22 11:55:27 qm250 kernel: lock_dlm: Assertion failed on line 357 of file /home/buildcentos/rpmbuild/BUILD/gf s-kernel-2.6.9-60/up/src/dlm/lock.c Apr 22 11:55:27 qm250 kernel: lock_dlm: assertion: "!error" Apr 22 11:55:27 qm250 kernel: lock_dlm: time = 14525882 Apr 22 11:55:27 qm250 kernel: qm: error=-22 num=2,1a lkf=10000 flags=84 Apr 22 11:55:27 qm250 kernel: Apr 22 11:55:27 qm250 kernel: ------------[ cut here ]------------ Apr 22 11:55:27 qm250 kernel: kernel BUG at /home/buildcentos/rpmbuild/BUILD/gfs-kernel-2.6.9-60/up/src/dlm/lock. c:357! Apr 22 11:55:27 qm250 kernel: invalid operand: 0000 [#1] Apr 22 11:55:27 qm250 kernel: Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) parport_pc lp parport autofs4 dlm(U) cman(U) md5 ipv6 sunrpc dm_mirror dm_mod uhci_hcd e100 mii floppy ext3 jbd qla2200 qla2xxx scsi_transport _fc sd_mod scsi_mod Apr 22 11:55:27 qm250 kernel: CPU: 0 Apr 22 11:55:27 qm250 kernel: EIP: 0060:[<e09aacfe>] Not tainted VLI Apr 22 11:55:27 qm250 kernel: EFLAGS: 00010246 (2.6.9-42.0.3.EL) Apr 22 11:55:27 qm250 kernel: EIP is at do_dlm_unlock+0x89/0x9e [lock_dlm] Apr 22 11:55:27 qm250 kernel: eax: 00000001 ebx: dfd552e0 ecx: e09b089f edx: dafe9f44 Apr 22 11:55:27 qm250 kernel: esi: ffffffea edi: dfd552e0 ebp: e0a62000 esp: dafe9f40 Apr 22 11:55:27 qm250 kernel: ds: 007b es: 007b ss: 0068 Apr 22 11:55:27 qm250 kernel: Process gfs_glockd (pid: 2647, threadinfo=dafe9000 task=de442c50) Apr 22 11:55:27 qm250 kernel: Stack: e09b089f e0a62000 00000003 e09aafff e0ae3e51 dfd7d4ac e0a62000 e0b156c0 Apr 22 11:55:27 qm250 kernel: e0ad6bd4 dfd7d4ac e0b156c0 dafe9fb4 e0ad5683 dfd7d4ac 00000001 e0ad5840 Apr 22 11:55:27 qm250 kernel: dfd7d4ac dfd7d4ac e0ad5af9 dfd7d550 e0ad9182 dafe9000 dafe9fc0 e0ac8e9a Apr 22 11:55:27 qm250 kernel: Call Trace: Apr 22 11:55:27 qm250 kernel: [<e09aafff>] lm_dlm_unlock+0x13/0x1b [lock_dlm] Apr 22 11:55:27 qm250 kernel: [<e0ae3e51>] gfs_lm_unlock+0x2b/0x40 [gfs] Apr 22 11:55:27 qm250 kernel: [<e0ad6bd4>] gfs_glock_drop_th+0x17a/0x1b0 [gfs] Apr 22 11:55:27 qm250 kernel: [<e0ad5683>] rq_demote+0x15c/0x1da [gfs] Apr 22 11:55:27 qm250 kernel: [<e0ad5840>] run_queue+0x5a/0xc1 [gfs] Apr 22 11:55:27 qm250 kernel: [<e0ad5af9>] unlock_on_glock+0x6e/0xc8 [gfs] Apr 22 11:55:27 qm250 kernel: [<e0ad9182>] gfs_reclaim_glock+0x257/0x2ae [gfs] Apr 22 11:55:27 qm250 kernel: [<e0ac8e9a>] gfs_glockd+0x38/0xde [gfs] Apr 22 11:55:27 qm250 kernel: [<c0120049>] default_wake_function+0x0/0xc Apr 22 11:55:27 qm250 kernel: [<c0318d7e>] ret_from_fork+0x6/0x14 Apr 22 11:55:27 qm250 kernel: [<c0120049>] default_wake_function+0x0/0xc Apr 22 11:55:28 qm250 kernel: [<e0ac8e62>] gfs_glockd+0x0/0xde [gfs] Apr 22 11:55:28 qm250 kernel: [<c01041dd>] kernel_thread_helper+0x5/0xb Apr 22 11:55:28 qm250 kernel: Code: 73 34 8b 03 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 ff 70 18 68 ac 09 9b e0 e8 10 9c 77 df 83 c4 34 68 9f 08 9b e0 e8 03 9c 77 df <0f> 0b 65 01 2e 07 9b e0 68 a1 08 9b e0 e8 5b 90 77 df 5b 5e c3 Apr 22 11:55:28 qm250 kernel: <0>Fatal exception: panic in 5 seconds -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster