I am using Scientific Linux 4.8 (32 bits), with these version of the various components:
kernel-smp-2.6.9-89.0.3.EL
GFS-6.1.19-1.el4
GFS-kernel-smp-2.6.9-85.2.1
ccs-1.0.12-1
cman-1.0.27-1.el4
cman-kernel-smp-2.6.9-56.7.4
dlm-kernel-smp-2.6.9-58.6.1
fence-1.32.67-1.el4
I have an 18 node cluster which is used for IO intensive computation. The IO intensive part is done on a FC-connected RAID connected to all nodes. We are using GFS for the clustered IO intensive filesystem, and no other resources at all.
Each node has 2 network connections, a private cluster only connection and a public connection. All cluster communications, like dlm traffic, will go via the private network.
We have the following problem: When, for whatever reason, a node loses its private network connection, several other nodes in the cluster quickly crash. This takes the cluster out of its quorate state, locking the filesystem, and basically means a bit of a job to clean up and users work is lost.
Fencing (via fence_sanbox2 agent) is working OK, but this does not help if the whole cluster dies.
The syslog output for an example node that dies is shown below. IMO it's a bug that the cluster software/kernel crashes, but aside from that, are there are GFS or CMAN or kernel timeouts or tuneables that i can change to allow smoother operations? For example this morning's outage was caused merely by me trying to change a defective cable, the "private" network would/should have been unavailable for no more than 30 seconds or so.
A search of bugzilla.redhat.com found a few similar bugs, but most seemed fixed ages ago.
Thanks in advance for any advice or suggestions,
Kevin
Nov 30 07:03:01 HPC_01 kernel: CMAN: node HPC_14 has been removed from the cluster : Missed too many heartbeats
Nov 30 07:03:01 HPC_01 kernel: CMAN: Started transition, generation 44
Nov 30 07:03:02 HPC_01 kernel: CMAN: Finished transition, generation 44
Nov 30 07:03:02 HPC_01 fenced[5393]: fencing deferred to HPC_07
Nov 30 07:03:05 HPC_01 kernel: GFS: fsid=HPC_-cluster:lv_fastfs.0: jid=13: Trying to acquire journal lock...
Nov 30 07:03:05 HPC_01 kernel: GFS: fsid=HPC_-cluster:lv_fastfs.0: jid=13: Busy
Nov 30 07:03:48 HPC_01 kernel: CMAN: node HPC_10 has been removed from the cluster : No response to messages
Nov 30 07:03:54 HPC_01 kernel: CMAN: node HPC_06 has been removed from the cluster : No response to messages
Nov 30 07:04:01 HPC_01 kernel: CMAN: node HPC_17 has been removed from the cluster : No response to messages
Nov 30 07:04:08 HPC_01 kernel: CMAN: node HPC_18 has been removed from the cluster : No response to messages
Nov 30 07:04:15 HPC_01 kernel: CMAN: node HPC_02 has been removed from the cluster : No response to messages
Nov 30 07:04:22 HPC_01 kernel: CMAN: node HPC_05 has been removed from the cluster : No response to messages
Nov 30 07:04:29 HPC_01 kernel: CMAN: node HPC_11 has been removed from the cluster : No response to messages
Nov 30 07:04:36 HPC_01 kernel: CMAN: node HPC_09 has been removed from the cluster : No response to messages
Nov 30 07:04:43 HPC_01 kernel: CMAN: node HPC_12 has been removed from the cluster : No response to messages
Nov 30 07:04:50 HPC_01 kernel: CMAN: node HPC_03 has been removed from the cluster : No response to messages
Nov 30 07:04:57 HPC_01 kernel: CMAN: node HPC_01 has been removed from the cluster : No response to messages
Nov 30 07:04:57 HPC_01 kernel: CMAN: killed by NODEDOWN message
Nov 30 07:04:57 HPC_01 kernel: CMAN: we are leaving the cluster. No response to messages
Nov 30 07:04:57 HPC_01 kernel: WARNING: dlm_emergency_shutdown
Nov 30 07:04:58 HPC_01 kernel: WARNING: dlm_emergency_shutdown finished 2
Nov 30 07:04:58 HPC_01 kernel: SM: 00000003 sm_stop: SG still joined
Nov 30 07:04:58 HPC_01 kernel: SM: 01000005 sm_stop: SG still joined
Nov 30 07:04:58 HPC_01 kernel: SM: 02000009 sm_stop: SG still joined
Nov 30 07:04:58 HPC_01 ccsd[5212]: Cluster manager shutdown. Attemping to reconnect...
Nov 30 07:05:09 HPC_01 kernel: dlm: dlm_lock: no lockspace
Nov 30 07:05:09 HPC_01 kernel: d 0 requests
Nov 30 07:05:09 HPC_01 kernel: clvmd purge locks of departed nodes
Nov 30 07:05:09 HPC_01 kernel: clvmd purged 1 locks
Nov 30 07:05:09 HPC_01 kernel: clvmd update remastered resources
Nov 30 07:05:09 HPC_01 kernel: clvmd updated 0 resources
Nov 30 07:05:09 HPC_01 kernel: clvmd rebuild locks
Nov 30 07:05:09 HPC_01 kernel: clvmd rebuilt 0 locks
Nov 30 07:05:09 HPC_01 kernel: clvmd recover event 86 done
Nov 30 07:05:09 HPC_01 kernel: clvmd move flags 0,0,1 ids 83,86,86
Nov 30 07:05:09 HPC_01 kernel: clvmd process held requests
Nov 30 07:05:09 HPC_01 kernel: clvmd processed 0 requests
Nov 30 07:05:09 HPC_01 kernel: clvmd resend marked requests
Nov 30 07:05:09 HPC_01 kernel: clvmd resent 0 requests
Nov 30 07:05:09 HPC_01 kernel: clvmd recover event 86 finished
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs mark waiting requests
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs marked 0 requests
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs purge locks of departed nodes
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs purged 31363 locks
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs update remastered resources
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs updated 1 resources
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs rebuild locks
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs rebuilt 1 locks
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs recover event 86 done
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs move flags 0,0,1 ids 84,86,86
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs process held requests
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs processed 0 requests
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs resend marked requests
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs resent 0 requests
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs recover event 86 finished
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs (6290) req reply einval 4640242 fr 18 r 18
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs send einval to 8
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs send einval to 6
Nov 30 07:05:09 HPC_01 kernel: overy_done jid 24 msg 309 b
Nov 30 07:05:09 HPC_01 kernel: 6267 recovery_done jid 25 msg 309 b
Nov 30 07:05:09 HPC_01 kernel: 6267 recovery_done jid 26 msg 309 b
Nov 30 07:05:09 HPC_01 kernel: 6267 recovery_done jid 27 msg 309 b
Nov 30 07:05:09 HPC_01 kernel: 6267 recovery_done jid 28 msg 309 b
Nov 30 07:05:09 HPC_01 kernel: 6267 recovery_done jid 29 msg 309 b
Nov 30 07:05:09 HPC_01 kernel: 6267 recovery_done jid 30 msg 309 b
Nov 30 07:05:09 HPC_01 kernel: 6267 recovery_done jid 31 msg 309 b
Nov 30 07:05:09 HPC_01 kernel: 6267 others_may_mount b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 38 last_start 40 last_finish 38
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 2 type 2 event 40 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 40 done 1
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 40 last_start 42 last_finish 40
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 3 type 2 event 42 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 42 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 42 last_start 44 last_finish 42
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 4 type 2 event 44 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 44 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 44 last_start 46 last_finish 44
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 5 type 2 event 46 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 46 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 46 last_start 48 last_finish 46
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 6 type 2 event 48 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 48 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 48 last_start 50 last_finish 48
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 7 type 2 event 50 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 50 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start last_stop 50 last_start 52 last_finish 50
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start count 8 type 2 event 52 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start 52 done 1
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 52 last_start 54 last_finish 52
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 9 type 2 event 54 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 54 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 54 last_start 56 last_finish 54
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 10 type 2 event 56 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 56 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 56 last_start 58 last_finish 56
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 11 type 2 event 58 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 58 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 58 last_start 60 last_finish 58
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 12 type 2 event 60 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 60 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 60 last_start 62 last_finish 60
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 13 type 2 event 62 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 62 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 62 last_start 64 last_finish 62
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 14 type 2 event 64 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 64 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 64 last_start 66 last_finish 64
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 15 type 2 event 66 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 66 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 66 last_start 68 last_finish 66
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 16 type 2 event 68 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 68 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 68 last_start 70 last_finish 68
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 17 type 2 event 70 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 70 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 70 last_start 72 last_finish 70
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 18 type 2 event 72 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 72 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 rereq 5,3f3a6c7c id 13be024a 5,0
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start last_stop 72 last_start 73 last_finish 72
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start count 17 type 1 event 73 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start cb jid 12 id 18
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start 73 done 0
Nov 30 07:05:09 HPC_01 kernel: 6288 recovery_done jid 12 msg 308 91b
Nov 30 07:05:09 HPC_01 kernel: 6288 recovery_done nodeid 18 flg 1b
Nov 30 07:05:09 HPC_01 kernel: 6288 recovery_done start_done 73
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 73 last_start 77 last_finish 73
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 18 type 2 event 77 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6283 rereq 2,19 id 132c02a2 5,0
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 77 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start last_stop 77 last_start 78 last_finish 77
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start count 17 type 3 event 78 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_start 78 done 1
Nov 30 07:05:09 HPC_01 kernel: 6283 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6283 rereq 5,38445187 id d0100322 3,0
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 78 last_start 85 last_finish 78
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 18 type 2 event 85 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6283 rereq 2,19 id d0550264 5,0
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 85 done 1
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start last_stop 85 last_start 86 last_finish 85
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start count 17 type 1 event 86 flags a1b
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start cb jid 13 id 5
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_start 86 done 0
Nov 30 07:05:09 HPC_01 kernel: 6288 recovery_done jid 13 msg 308 91b
Nov 30 07:05:09 HPC_01 kernel: 6288 recovery_done nodeid 5 flg 1b
Nov 30 07:05:09 HPC_01 kernel: 6288 recovery_done start_done 86
Nov 30 07:05:09 HPC_01 kernel: 6284 pr_finish flags 81b
Nov 30 07:05:09 HPC_01 kernel:
Nov 30 07:05:09 HPC_01 kernel: lock_dlm: Assertion failed on line 440 of file /mnt/src/4/BUILD/gfs-kernel-2.6.9-85/smp/src/dlm/lock.c
Nov 30 07:05:09 HPC_01 kernel: lock_dlm: assertion: "!error"
Nov 30 07:05:09 HPC_01 kernel: lock_dlm: time = 1199646181
Nov 30 07:05:09 HPC_01 kernel: lv_fastfs: num=2,1a err=-22 cur=-1 req=3 lkf=10000
Nov 30 07:05:09 HPC_01 kernel:
Nov 30 07:05:09 HPC_01 kernel: ------------[ cut here ]------------
Nov 30 07:05:09 HPC_01 kernel: kernel BUG at /mnt/src/4/BUILD/gfs-kernel-2.6.9-85/smp/src/dlm/lock.c:440!
Nov 30 07:05:09 HPC_01 kernel: invalid operand: 0000 [#1]
Nov 30 07:05:09 HPC_01 kernel: SMP
Nov 30 07:05:09 HPC_01 kernel: Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) sg lquota(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) dlm(U) cman(U) nfsd exportfs md5 ipv6 parport_pc lp parport nfs lockd nfs_acl sunrpc dm_mirror dm_mod button battery ac ohci_hcd hw_random k8_edac edac_mc e1000 floppy ext3 jbd qla2300 qla2xxx scsi_transport_fc sata_sil libata sd_mod scsi_mod
Nov 30 07:05:09 HPC_01 kernel: CPU: 1
Nov 30 07:05:09 HPC_01 kernel: EIP: 0060:[<f8e9e820>] Not tainted VLI
Nov 30 07:05:09 HPC_01 kernel: EFLAGS: 00010246 (2.6.9-89.0.3.ELsmp)
Nov 30 07:05:09 HPC_01 kernel: EIP is at do_dlm_lock+0x134/0x14e [lock_dlm]
Nov 30 07:05:09 HPC_01 kernel: eax: 00000001 ebx: ffffffea ecx: c88e5da8 edx: f8ea3409
Nov 30 07:05:09 HPC_01 kernel: esi: f8e9e83f edi: f7dbda00 ebp: f6c7b280 esp: c88e5da4
Nov 30 07:05:09 HPC_01 kernel: ds: 007b es: 007b ss: 0068
Nov 30 07:05:09 HPC_01 kernel: Process rm (pid: 15538, threadinfo=c88e5000 task=f65780b0)
Nov 30 07:05:09 HPC_01 kernel: Stack: f8ea3409 20202020 32202020 20202020 20202020 20202020 61312020 ffff0018
Nov 30 07:05:09 HPC_01 kernel: ffffffff f6c7b280 00000003 00000000 f6c7b280 f8e9e8cf 00000003 f8ea6dc0
Nov 30 07:05:09 HPC_01 kernel: f8e77000 f8fead96 00000008 00000001 f655ba78 f655ba5c f8e77000 f8fe09d2
Nov 30 07:05:09 HPC_01 kernel: Call Trace:
Nov 30 07:05:09 HPC_01 kernel: [<f8e9e8cf>] lm_dlm_lock+0x49/0x52 [lock_dlm]
Nov 30 07:05:09 HPC_01 kernel: [<f8fead96>] gfs_lm_lock+0x35/0x4d [gfs]
Nov 30 07:05:09 HPC_01 kernel: [<f8fe09d2>] gfs_glock_xmote_th+0x130/0x172 [gfs]
Nov 30 07:05:09 HPC_01 kernel: [<f8fe0091>] rq_promote+0xc8/0x147 [gfs]
Nov 30 07:05:09 HPC_01 kernel: [<f8fe027d>] run_queue+0x91/0xc1 [gfs]
Nov 30 07:05:09 HPC_01 kernel: [<f8fe1293>] gfs_glock_nq+0xcf/0x116 [gfs]
Nov 30 07:05:09 HPC_01 kernel: [<f8fe182d>] gfs_glock_nq_init+0x13/0x26 [gfs]
Nov 30 07:05:09 HPC_01 kernel: [<f8ff9ed9>] gfs_permission+0x0/0x61 [gfs]
Nov 30 07:05:09 HPC_01 kernel: [<f8ff9f13>] gfs_permission+0x3a/0x61 [gfs]
Nov 30 07:05:09 HPC_01 kernel: [<f8ff9ed9>] gfs_permission+0x0/0x61 [gfs]
Nov 30 07:05:09 HPC_01 kernel: [<c0169037>] permission+0x4a/0x6e
Nov 30 07:05:09 HPC_01 kernel: [<c01695eb>] __link_path_walk+0x14a/0xc25
Nov 30 07:05:09 HPC_01 kernel: [<c011b897>] do_page_fault+0x1ae/0x5c6
Nov 30 07:05:09 HPC_01 kernel: [<c016a0fc>] link_path_walk+0x36/0xa1
Nov 30 07:05:09 HPC_01 kernel: [<c016a481>] path_lookup+0x14b/0x17f
Nov 30 07:05:09 HPC_01 kernel: [<c016bb1d>] sys_unlink+0x2c/0x132
Nov 30 07:05:09 HPC_01 kernel: [<c02d8231>] unix_ioctl+0xd1/0xda
Nov 30 07:05:09 HPC_01 kernel: [<c016db6a>] sys_ioctl+0x227/0x269
Nov 30 07:05:09 HPC_01 kernel: [<c016dba0>] sys_ioctl+0x25d/0x269
Nov 30 07:05:09 HPC_01 kernel: [<c02ddb2b>] syscall_call+0x7/0xb
Nov 30 07:05:09 HPC_01 kernel: Code: 26 50 0f bf 45 24 50 53 ff 75 08 ff 75 04 ff 75 0c ff 77 18 68 8f 35 ea f8 e8 03 4d 28 c7 83 c4 38 68 09 34 ea f8 e8 f6 4c 28 c7 <0f> 0b b8 01 56 33 ea f8 68 0b 34 ea f8 e8 91 44 28 c7 83 c4 20
Nov 30 07:05:09 HPC_01 kernel: <0>Fatal exception: panic in 5 seconds
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster