2008/8/18 Shawn Hood <shawnlhood@xxxxxxxxx>: > Could you post the errors from syslog/dmesg? as i was finishing off this email, i just noticed this from the logs near the end of blade2: Aug 17 19:50:24 blade2 gfs_controld[2839]: retrieve_plocks: ckpt open error 12 cache1 That happens after blade2 has been fenced, has sucessfully rejoined the fence and cman domains, and is now trying to mount gfs filesystems. The first gfs file system it tries to mount causes a lock up. :) well, 1 node lost connectivity (blade2) to the cluster and was fenced (fence_ilo) - the environment is HP bladesystem with x86_64 blades (2 intel, 1 amd). Here are logs from blade3: Aug 17 19:48:55 blade3 openais[2696]: [TOTEM] The token was lost in the OPERATIONAL state. Aug 17 19:48:55 blade3 openais[2696]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Aug 17 19:48:55 blade3 openais[2696]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Aug 17 19:48:55 blade3 openais[2696]: [TOTEM] entering GATHER state from 2. Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] entering GATHER state from 11. Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Creating commit token because I am the rep. Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Saving state aru 1fd high seq received 1fd Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Storing new sequence id for ring 25c Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] entering COMMIT state. Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] entering RECOVERY state. Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] position [0] member 192.168.70.103: Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] previous ring seq 600 rep 192.168.70.102 Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] aru 1fd high delivered 1fd received flag 1 Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] position [1] member 192.168.70.104: Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] previous ring seq 600 rep 192.168.70.102 Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] aru 1fd high delivered 1fd received flag 1 Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Did not need to originate any messages in recovery. Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Sending initial ORF token Aug 17 19:49:00 blade3 kernel: dlm: closing connection to node 2 Aug 17 19:49:00 blade3 openais[2696]: [CLM ] CLM CONFIGURATION CHANGE Aug 17 19:49:00 blade3 openais[2696]: [CLM ] New Configuration: Aug 17 19:49:00 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.103) Aug 17 19:49:00 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.104) Aug 17 19:49:00 blade3 openais[2696]: [CLM ] Members Left: Aug 17 19:49:00 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.102) Aug 17 19:49:00 blade3 openais[2696]: [CLM ] Members Joined: Aug 17 19:49:00 blade3 openais[2696]: [CLM ] CLM CONFIGURATION CHANGE Aug 17 19:49:00 blade3 openais[2696]: [CLM ] New Configuration: Aug 17 19:49:00 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.103) Aug 17 19:49:00 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.104) Aug 17 19:49:00 blade3 openais[2696]: [CLM ] Members Left: Aug 17 19:49:00 blade3 openais[2696]: [CLM ] Members Joined: Aug 17 19:49:00 blade3 openais[2696]: [SYNC ] This node is within the primary component and will provide service. Aug 17 19:49:01 blade3 openais[2696]: [TOTEM] entering OPERATIONAL state. Aug 17 19:49:01 blade3 openais[2696]: [CLM ] got nodejoin message 192.168.70.103 Aug 17 19:49:01 blade3 openais[2696]: [CLM ] got nodejoin message 192.168.70.104 Aug 17 19:49:01 blade3 openais[2696]: [CPG ] got joinlist message from node 4 Aug 17 19:49:01 blade3 openais[2696]: [CPG ] got joinlist message from node 3 Aug 17 19:49:03 blade3 fenced[2712]: blade2 not a cluster member after 3 sec post_fail_delay Aug 17 19:49:03 blade3 fenced[2712]: fencing node "blade2" Aug 17 19:49:16 blade3 fenced[2712]: fence "blade2" success Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1: jid=2: Trying to acquire journal lock... Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1: jid=2: Trying to acquire journal lock... Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1: jid=2: Looking at journal... Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1: jid=2: Looking at journal... Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1: jid=2: Acquiring the transaction lock... Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1: jid=2: Acquiring the transaction lock... Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1: jid=2: Replaying journal... Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1: jid=2: Replayed 0 of 1 blocks Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1: jid=2: replays = 0, skips = 0, sames = 1 Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1: jid=2: Replaying journal... Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1: jid=2: Journal replayed in 1s Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1: jid=2: Done Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1: jid=2: Replayed 0 of 9 blocks Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1: jid=2: replays = 0, skips = 0, sames = 9 Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1: jid=2: Journal replayed in 1s Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1: jid=2: Done Aug 17 19:49:30 blade3 openais[2696]: [CMAN ] lost contact with quorum device Aug 17 19:49:30 blade3 openais[2696]: [CMAN ] quorum lost, blocking activity Aug 17 19:49:30 blade3 kernel: dlm: closing connection to node 0 Aug 17 19:49:30 blade3 qdiskd[2765]: <info> Assuming master role Aug 17 19:49:33 blade3 qdiskd[2765]: <notice> Writing eviction notice for node 2 Aug 17 19:49:33 blade3 openais[2696]: [CMAN ] quorum regained, resuming activity Aug 17 19:49:36 blade3 qdiskd[2765]: <notice> Node 2 evicted Aug 17 19:51:25 blade3 openais[2696]: [TOTEM] entering GATHER state from 11. Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] Saving state aru 28 high seq received 28 Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] Storing new sequence id for ring 260 Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] entering COMMIT state. Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] entering RECOVERY state. Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] position [0] member 192.168.70.102: Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] previous ring seq 604 rep 192.168.70.102 Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] aru 9 high delivered 9 received flag 1 Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] position [1] member 192.168.70.103: Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] previous ring seq 604 rep 192.168.70.103 Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] aru 28 high delivered 28 received flag 1 Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] position [2] member 192.168.70.104: Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] previous ring seq 604 rep 192.168.70.103 Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] aru 28 high delivered 28 received flag 1 Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] Did not need to originate any messages in recovery. Aug 17 19:51:26 blade3 openais[2696]: [CLM ] CLM CONFIGURATION CHANGE Aug 17 19:51:26 blade3 openais[2696]: [CLM ] New Configuration: Aug 17 19:51:26 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.103) Aug 17 19:51:26 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.104) Aug 17 19:51:26 blade3 openais[2696]: [CLM ] Members Left: Aug 17 19:51:26 blade3 openais[2696]: [CLM ] Members Joined: Aug 17 19:51:26 blade3 openais[2696]: [CLM ] CLM CONFIGURATION CHANGE Aug 17 19:51:26 blade3 openais[2696]: [CLM ] New Configuration: Aug 17 19:51:26 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.102) Aug 17 19:51:26 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.103) Aug 17 19:51:26 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.104) Aug 17 19:51:26 blade3 openais[2696]: [CLM ] Members Left: Aug 17 19:51:26 blade3 openais[2696]: [CLM ] Members Joined: Aug 17 19:51:26 blade3 openais[2696]: [CLM ] r(0) ip(192.168.70.102) Aug 17 19:51:26 blade3 openais[2696]: [SYNC ] This node is within the primary component and will provide service. Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] entering OPERATIONAL state. Aug 17 19:51:26 blade3 openais[2696]: [CLM ] got nodejoin message 192.168.70.102 Aug 17 19:51:26 blade3 openais[2696]: [CLM ] got nodejoin message 192.168.70.103 Aug 17 19:51:26 blade3 openais[2696]: [CLM ] got nodejoin message 192.168.70.104 Aug 17 19:51:26 blade3 openais[2696]: [CPG ] got joinlist message from node 3 Aug 17 19:51:26 blade3 openais[2696]: [CPG ] got joinlist message from node 4 Aug 17 19:51:44 blade3 kernel: dlm: connecting to 2 Aug 18 17:16:03 blade3 openais[2696]: [TOTEM] The token was lost in the OPERATIONAL state. Aug 18 17:16:03 blade3 openais[2696]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Aug 18 17:16:03 blade3 openais[2696]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Aug 18 17:16:03 blade3 openais[2696]: [TOTEM] entering GATHER state from 2. blade2 logs are as follows: 19:46 - last log entry unrelated to gfs. system freezes. 19:50 - boot log entries - system has been fenced and is now starting up. Aug 17 19:50:03 blade2 ccsd[2804]: Starting ccsd 2.0.73: Aug 17 19:50:03 blade2 ccsd[2804]: Built: Nov 12 2007 13:07:35 Aug 17 19:50:03 blade2 ccsd[2804]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Aug 17 19:50:03 blade2 ccsd[2804]: cluster.conf (cluster name = jemdevcluster, version = 5) found. Aug 17 19:50:03 blade2 ccsd[2804]: Remote copy of cluster.conf is from quorate node. Aug 17 19:50:03 blade2 ccsd[2804]: Local version # : 5 Aug 17 19:50:03 blade2 ccsd[2804]: Remote version #: 5 Aug 17 19:50:04 blade2 ccsd[2804]: Remote copy of cluster.conf is from quorate node. Aug 17 19:50:04 blade2 ccsd[2804]: Local version # : 5 Aug 17 19:50:04 blade2 ccsd[2804]: Remote version #: 5 Aug 17 19:50:04 blade2 ccsd[2804]: Remote copy of cluster.conf is from quorate node. Aug 17 19:50:04 blade2 ccsd[2804]: Local version # : 5 Aug 17 19:50:04 blade2 ccsd[2804]: Remote version #: 5 Aug 17 19:50:04 blade2 ccsd[2804]: Remote copy of cluster.conf is from quorate node. Aug 17 19:50:04 blade2 ccsd[2804]: Local version # : 5 Aug 17 19:50:04 blade2 ccsd[2804]: Remote version #: 5 Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] AIS Executive Service RELEASE 'subrev 1358 version 0.80.3' Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors. Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] Copyright (C) 2006 Red Hat, Inc. Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] AIS Executive Service: started and ready to provide service. Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] Using default multicast address of 239.192.24.76 Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] openais component openais_cpg loaded. Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] Registering service handler 'openais cluster closed process group service v1.01' Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] openais component openais_cfg loaded. Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service handler 'openais configuration service' Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component openais_msg loaded. Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service handler 'openais message service B.01.01' Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component openais_lck loaded. Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service handler 'openais distributed locking service B.01.01' Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component openais_evt loaded. Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service handler 'openais event service B.01.01' Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component openais_ckpt loaded. Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service handler 'openais checkpoint service B.01.01' Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component openais_amf loaded. Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service handler 'openais availability management framework B.01.01' Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component openais_clm loaded. Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service handler 'openais cluster membership service B.01.01' Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component openais_evs loaded. Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service handler 'openais extended virtual synchrony service' Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component openais_cman loaded. Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service handler 'openais CMAN membership service 2.01' Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms) Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans) Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms) Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs) Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500 Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages) Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] send threads (0 threads) Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] RRP token expired timeout (495 ms) Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] RRP token problem counter (2000 ms) Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] RRP threshold (10 problem count) Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] RRP mode set to none. Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] heartbeat_failures_allowed (0) Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] max_network_delay (50 ms) Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0 Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] The network interface [192.168.70.102] is now up. Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Created or loaded sequence id 600.192.168.70.102 for this ring. Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] entering GATHER state from 15. Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais extended virtual synchrony service' Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais cluster membership service B.01.01' Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais availability management framework B.01.01' Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais checkpoint service B.01.01' Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais event service B.01.01' Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01' Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais message service B.01.01' Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais configuration service' Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01' Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01' Aug 17 19:50:06 blade2 openais[2811]: [CMAN ] CMAN 2.0.73 (built Nov 12 2007 13:07:39) started Aug 17 19:50:06 blade2 openais[2811]: [SYNC ] Not using a virtual synchrony filter. Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Creating commit token because I am the rep. Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Saving state aru 0 high seq received 0 Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Storing new sequence id for ring 25c Aug 17 19:50:06 blade2 ccsd[2804]: Initial status:: Quorate Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] entering COMMIT state. Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] entering RECOVERY state. Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] position [0] member 192.168.70.102: Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] previous ring seq 600 rep 192.168.70.102 Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] aru 0 high delivered 0 received flag 1 Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Did not need to originate any messages in recovery. Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Sending initial ORF token Aug 17 19:50:06 blade2 openais[2811]: [CLM ] CLM CONFIGURATION CHANGE Aug 17 19:50:07 blade2 openais[2811]: [CLM ] New Configuration: Aug 17 19:50:07 blade2 openais[2811]: [CLM ] Members Left: Aug 17 19:50:07 blade2 openais[2811]: [CLM ] Members Joined: Aug 17 19:50:07 blade2 openais[2811]: [CLM ] CLM CONFIGURATION CHANGE Aug 17 19:50:07 blade2 openais[2811]: [CLM ] New Configuration: Aug 17 19:50:07 blade2 openais[2811]: [CLM ] r(0) ip(192.168.70.102) Aug 17 19:50:07 blade2 openais[2811]: [CLM ] Members Left: Aug 17 19:50:07 blade2 openais[2811]: [CLM ] Members Joined: Aug 17 19:50:07 blade2 openais[2811]: [CLM ] r(0) ip(192.168.70.102) Aug 17 19:50:07 blade2 openais[2811]: [SYNC ] This node is within the primary component and will provide service. Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] entering OPERATIONAL state. Aug 17 19:50:07 blade2 openais[2811]: [CLM ] got nodejoin message 192.168.70.102 Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] entering GATHER state from 11. Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] Creating commit token because I am the rep. Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] Saving state aru 9 high seq received 9 Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] Storing new sequence id for ring 260 Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] entering COMMIT state. Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] entering RECOVERY state. Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] position [0] member 192.168.70.102: Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] previous ring seq 604 rep 192.168.70.102 Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] aru 9 high delivered 9 received flag 1 Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] position [1] member 192.168.70.103: Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] previous ring seq 604 rep 192.168.70.103 Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] aru 28 high delivered 28 received flag 1 Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] position [2] member 192.168.70.104: Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] previous ring seq 604 rep 192.168.70.103 Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] aru 28 high delivered 28 received flag 1 Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] Did not need to originate any messages in recovery. Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] Sending initial ORF token Aug 17 19:50:08 blade2 openais[2811]: [CLM ] CLM CONFIGURATION CHANGE Aug 17 19:50:08 blade2 openais[2811]: [CLM ] New Configuration: Aug 17 19:50:08 blade2 openais[2811]: [CLM ] r(0) ip(192.168.70.102) Aug 17 19:50:08 blade2 openais[2811]: [CLM ] Members Left: Aug 17 19:50:08 blade2 openais[2811]: [CLM ] Members Joined: Aug 17 19:50:08 blade2 openais[2811]: [CLM ] CLM CONFIGURATION CHANGE Aug 17 19:50:08 blade2 openais[2811]: [CLM ] New Configuration: Aug 17 19:50:08 blade2 openais[2811]: [CLM ] r(0) ip(192.168.70.102) Aug 17 19:50:08 blade2 openais[2811]: [CLM ] r(0) ip(192.168.70.103) Aug 17 19:50:08 blade2 openais[2811]: [CLM ] r(0) ip(192.168.70.104) Aug 17 19:50:08 blade2 openais[2811]: [CLM ] Members Left: Aug 17 19:50:08 blade2 openais[2811]: [CLM ] Members Joined: Aug 17 19:50:08 blade2 openais[2811]: [CLM ] r(0) ip(192.168.70.103) Aug 17 19:50:08 blade2 openais[2811]: [CLM ] r(0) ip(192.168.70.104) Aug 17 19:50:08 blade2 openais[2811]: [SYNC ] This node is within the primary component and will provide service. Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] entering OPERATIONAL state. Aug 17 19:50:09 blade2 openais[2811]: [CMAN ] quorum regained, resuming activity Aug 17 19:50:09 blade2 openais[2811]: [CLM ] got nodejoin message 192.168.70.102 Aug 17 19:50:09 blade2 openais[2811]: [CLM ] got nodejoin message 192.168.70.103 Aug 17 19:50:09 blade2 openais[2811]: [CLM ] got nodejoin message 192.168.70.104 Aug 17 19:50:09 blade2 openais[2811]: [CPG ] got joinlist message from node 3 Aug 17 19:50:09 blade2 openais[2811]: [CPG ] got joinlist message from node 4 Aug 17 19:50:09 blade2 qdiskd[2877]: <info> Quorum Partition: /dev/sda5 Label: jemqdisk Aug 17 19:50:09 blade2 qdiskd[2878]: <info> Quorum Daemon Initializing Aug 17 19:50:22 blade2 qdiskd[2878]: <info> Node 3 is the master Aug 17 19:50:24 blade2 gfs_controld[2839]: retrieve_plocks: ckpt open error 12 cache1 Aug 17 19:50:40 blade2 qdiskd[2878]: <info> Initial score 1/1 Aug 17 19:50:40 blade2 qdiskd[2878]: <info> Initialization complete Aug 17 19:50:40 blade2 openais[2811]: [CMAN ] quorum device registered Aug 17 19:50:40 blade2 qdiskd[2878]: <notice> Score sufficient for master operation (1/1; required=1); upgrading > > Shawn > > On Mon, Aug 18, 2008 at 11:35 AM, Brett Cave <brettcave@xxxxxxxxx> wrote: >> >> GFS has frozen again - after reconfiguring and running GFS for almost >> a month now, have not been able to get GFS running stably. >> >> [root@blade2 ~]# cat /etc/issue >> CentOS release 5 (Final) >> Kernel \r on an \m >> >> [root@blade2 ~]# uname -a >> Linux blade2 2.6.18-53.el5 #1 SMP Mon Nov 12 02:14:55 EST 2007 x86_64 >> x86_64 x86_64 GNU/Linux >> >> [root@blade2 ~]# rpm -qa | grep gfs >> gfs2-utils-0.1.38-1.el5 >> kmod-gfs-0.1.19-7.el5 >> gfs-utils-0.1.12-1.el5 >> >> [root@blade2 ~]# modinfo gfs >> filename: /lib/modules/2.6.18-53.el5/extra/gfs/gfs.ko >> license: GPL >> author: Red Hat, Inc. >> description: Global File System 0.1.19-7.el5 >> srcversion: 18B81D3FD6ECDCCFA53D745 >> depends: gfs2 >> vermagic: 2.6.18-53.el5 SMP mod_unload gcc-4.1 >> >> >> Is anyone actually running GFS on Centos5 stably? Was running gfs2, >> but was also unstable, hence the move back to gfs. >> >> Setup: 3node cluster with 1 vote each and 1 quorum disk. >> Each node has 1 x dual port hba connected to a fibra san (no >> multipath, only single port on each card connected to SAN). SAN is >> MSA1500. 2 GFS partitions, 1 qdisk partition on SAN. >> >> System runs fine for a few days, and then will notice that some >> mountpoints become unavailable. The entire system locks up when this >> happens, and the only option I have is to reset all nodes in the >> cluster to start up the cluster again. no errors in logs, nothing out >> of the ordinary that i can see. >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Shawn Hood > 910.670.1819 m > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster