Alessandro Bono napsal(a): > > Il 12/02/14 14:15, Jan Friesse ha scritto: >> Hi Alessandro, >> I was looking to log file and it looks like it is starting right after >> token was lost. Do you have log BEFORE that happen. > > Hi Honza > > I have previous log file but something not worked correctly > > [root@ga1-ext ~]# ls -alF /var/log/cluster/corosync.log-2014021* > -rw-rw---- 1 hacluster haclient 28963 Feb 10 03:32 > /var/log/cluster/corosync.log-20140210.gz > -rw-rw---- 1 hacluster haclient 169899 Feb 11 03:43 > /var/log/cluster/corosync.log-20140211.gz > -rw-rw---- 1 hacluster haclient 35449 Feb 12 03:24 > /var/log/cluster/corosync.log-20140212.gz > > [root@ga1-ext ~]# zcat /var/log/cluster/corosync.log-20140211.gz |tail > Feb 10 21:47:02 [2248] ga1-ext cib: info: crm_client_new: > Connecting 0x280a140 for uid=0 gid=0 pid=2169 > id=308fdcf4-f47b-4afb-96f9-126601ff2573 > Feb 10 21:47:02 [2248] ga1-ext cib: info: > cib_process_request: Completed cib_query operation for section > 'all': OK (rc=0, origin=local/crm_resource/2, version=0.309.18) > Feb 10 21:47:02 [2248] ga1-ext cib: info: > cib_process_request: Forwarding cib_delete operation for section > constraints to master (origin=local/crm_resource/3) > Feb 10 21:47:02 [2248] ga1-ext cib: info: > cib_process_request: Completed cib_apply_diff operation for section > constraints: OK (rc=0, origin=ga2-ext/crm_resource/3, version=0.310.1) > Feb 10 21:47:02 [2249] ga1-ext stonith-ng: info: > update_cib_stonith_devices: Updating device list from the cib: new > location constraint > Feb 10 21:47:02 [2249] ga1-ext stonith-ng: notice: unpack_config: > On loss of CCM Quorum: Ignore > Feb 10 21:47:02 [2248] ga1-ext cib: info: crm_client_destroy: > Destroying 0 events > Feb 10 21:47:02 [2248] ga1-ext cib: info: write_cib_contents: > Archived previous version as /var/lib/heartbeat/crm/cib-45.raw > Feb 10 21:47:02 [2248] ga1-ext cib: info: write_cib_contents: > Wrote version 0.310.0 of the CIB to disk (digest: > 7d257caa36077afded22a7b5b47e27e5) > Feb 10 21:47:02 [2248] ga1-ext cib: info: retrieveCib: > Reading cluster configuration from: /var/lib/heartbeat/crm/cib.nAfCBw > (digest: /var/lib/heartbeat/crm/cib.Twax6l) > > [root@ga1-ext ~]# zcat /var/log/cluster/corosync.log-20140212.gz |head > Feb 11 23:26:01 corosync [TOTEM ] The token was lost in the OPERATIONAL > state. > Feb 11 23:26:01 corosync [TOTEM ] A processor failed, forming new > configuration. > Feb 11 23:26:01 corosync [TOTEM ] Receive multicast socket recv buffer > size (249856 bytes). > Feb 11 23:26:01 corosync [TOTEM ] Transmit multicast socket send buffer > size (249856 bytes). > Feb 11 23:26:01 corosync [TOTEM ] Local receive multicast loop socket > recv buffer size (249856 bytes). > Feb 11 23:26:01 corosync [TOTEM ] Local transmit multicast loop socket > send buffer size (249856 bytes). > Feb 11 23:26:01 corosync [TOTEM ] entering GATHER state from 2. > Feb 11 23:26:03 corosync [TOTEM ] entering GATHER state from 0. > Feb 11 23:26:03 corosync [TOTEM ] Creating commit token because I am the > rep. > Feb 11 23:26:03 corosync [TOTEM ] Saving state aru 14a high seq received > 14a > > so strange > tonight I can try another full backup and resend you log files > >> >> Anyway, give a try to increase token timeout to value like 10. It looks >> like you have 2 nodes and by default token timeout is 1 there. 10 is >> used for 3 and more nodes. Also I'm unsure if this was not changed >> between 6.3 and 6.4. >> >> Just use >> >> <totem token="X" consensus="X + 2000" /> >> >> where X is like 10000. > > ok, I'll try this config change > on corosync.conf from centos 6.3 era I have > token: 3000 > consensus: 5000 > Ok. This explains everything. Cman will OVERWRITE all config options from corosync.conf (or actually, corosync.conf is not read at all if cman is used) so use <totem token="X" consensus="X + 2000" /> and set X as 3000 and you should be fine. Regards, Honza >> >> Regards, >> Honza >> >> Alessandro Bono napsal(a): >>> Il 10/02/14 15:55, Jan Friesse ha scritto: >>>> Ok, but I would still really like to see log from 6.5 (there were huge >>>> amount of fixes for 6.5). >>> Hi Honza >>> >>> I find time to force a full backup >>> note that I forced a full backup on another vm with lots of data, this >>> is enough to stress host server and cause error on cluster >>> attached zipped log file with centos 6.5 >>> >>> thank you >>> > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss