Il 12/02/14 14:15, Jan Friesse ha scritto:
Hi Alessandro,
I was looking to log file and it looks like it is starting right after
token was lost. Do you have log BEFORE that happen.
Hi Honza
I have previous log file but something not worked correctly
[root@ga1-ext ~]# ls -alF /var/log/cluster/corosync.log-2014021*
-rw-rw---- 1 hacluster haclient 28963 Feb 10 03:32
/var/log/cluster/corosync.log-20140210.gz
-rw-rw---- 1 hacluster haclient 169899 Feb 11 03:43
/var/log/cluster/corosync.log-20140211.gz
-rw-rw---- 1 hacluster haclient 35449 Feb 12 03:24
/var/log/cluster/corosync.log-20140212.gz
[root@ga1-ext ~]# zcat /var/log/cluster/corosync.log-20140211.gz |tail
Feb 10 21:47:02 [2248] ga1-ext cib: info: crm_client_new:
Connecting 0x280a140 for uid=0 gid=0 pid=2169
id=308fdcf4-f47b-4afb-96f9-126601ff2573
Feb 10 21:47:02 [2248] ga1-ext cib: info:
cib_process_request: Completed cib_query operation for section
'all': OK (rc=0, origin=local/crm_resource/2, version=0.309.18)
Feb 10 21:47:02 [2248] ga1-ext cib: info:
cib_process_request: Forwarding cib_delete operation for section
constraints to master (origin=local/crm_resource/3)
Feb 10 21:47:02 [2248] ga1-ext cib: info:
cib_process_request: Completed cib_apply_diff operation for section
constraints: OK (rc=0, origin=ga2-ext/crm_resource/3, version=0.310.1)
Feb 10 21:47:02 [2249] ga1-ext stonith-ng: info:
update_cib_stonith_devices: Updating device list from the cib: new
location constraint
Feb 10 21:47:02 [2249] ga1-ext stonith-ng: notice: unpack_config:
On loss of CCM Quorum: Ignore
Feb 10 21:47:02 [2248] ga1-ext cib: info: crm_client_destroy:
Destroying 0 events
Feb 10 21:47:02 [2248] ga1-ext cib: info: write_cib_contents:
Archived previous version as /var/lib/heartbeat/crm/cib-45.raw
Feb 10 21:47:02 [2248] ga1-ext cib: info: write_cib_contents:
Wrote version 0.310.0 of the CIB to disk (digest:
7d257caa36077afded22a7b5b47e27e5)
Feb 10 21:47:02 [2248] ga1-ext cib: info: retrieveCib:
Reading cluster configuration from: /var/lib/heartbeat/crm/cib.nAfCBw
(digest: /var/lib/heartbeat/crm/cib.Twax6l)
[root@ga1-ext ~]# zcat /var/log/cluster/corosync.log-20140212.gz |head
Feb 11 23:26:01 corosync [TOTEM ] The token was lost in the OPERATIONAL
state.
Feb 11 23:26:01 corosync [TOTEM ] A processor failed, forming new
configuration.
Feb 11 23:26:01 corosync [TOTEM ] Receive multicast socket recv buffer
size (249856 bytes).
Feb 11 23:26:01 corosync [TOTEM ] Transmit multicast socket send buffer
size (249856 bytes).
Feb 11 23:26:01 corosync [TOTEM ] Local receive multicast loop socket
recv buffer size (249856 bytes).
Feb 11 23:26:01 corosync [TOTEM ] Local transmit multicast loop socket
send buffer size (249856 bytes).
Feb 11 23:26:01 corosync [TOTEM ] entering GATHER state from 2.
Feb 11 23:26:03 corosync [TOTEM ] entering GATHER state from 0.
Feb 11 23:26:03 corosync [TOTEM ] Creating commit token because I am the
rep.
Feb 11 23:26:03 corosync [TOTEM ] Saving state aru 14a high seq received 14a
so strange
tonight I can try another full backup and resend you log files
Anyway, give a try to increase token timeout to value like 10. It looks
like you have 2 nodes and by default token timeout is 1 there. 10 is
used for 3 and more nodes. Also I'm unsure if this was not changed
between 6.3 and 6.4.
Just use
<totem token="X" consensus="X + 2000" />
where X is like 10000.
ok, I'll try this config change
on corosync.conf from centos 6.3 era I have
token: 3000
consensus: 5000
Regards,
Honza
Alessandro Bono napsal(a):
Il 10/02/14 15:55, Jan Friesse ha scritto:
Ok, but I would still really like to see log from 6.5 (there were huge
amount of fixes for 6.5).
Hi Honza
I find time to force a full backup
note that I forced a full backup on another vm with lots of data, this
is enough to stress host server and cause error on cluster
attached zipped log file with centos 6.5
thank you
--
Cordiali saluti
Alessandro Bono
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss