Re: pacemaker "CPG API: failed Library error"

Alessandro Bono <alessandro.bono@xxxxxxxxx> · Wed, 12 Feb 2014 14:51:11 +0100

Il 12/02/14 14:15, Jan Friesse ha scritto:
Hi Alessandro,
I was looking to log file and it looks like it is starting right after
token was lost. Do you have log BEFORE that happen.

Hi Honza

I have previous log file but something not worked correctly

[root@ga1-ext ~]# ls -alF  /var/log/cluster/corosync.log-2014021*
-rw-rw---- 1 hacluster haclient  28963 Feb 10 03:32 
/var/log/cluster/corosync.log-20140210.gz
-rw-rw---- 1 hacluster haclient 169899 Feb 11 03:43 
/var/log/cluster/corosync.log-20140211.gz
-rw-rw---- 1 hacluster haclient  35449 Feb 12 03:24 
/var/log/cluster/corosync.log-20140212.gz

[root@ga1-ext ~]# zcat /var/log/cluster/corosync.log-20140211.gz |tail
Feb 10 21:47:02 [2248] ga1-ext        cib:     info: crm_client_new:     
Connecting 0x280a140 for uid=0 gid=0 pid=2169 
id=308fdcf4-f47b-4afb-96f9-126601ff2573
Feb 10 21:47:02 [2248] ga1-ext        cib:     info: 
cib_process_request:     Completed cib_query operation for section 
'all': OK (rc=0, origin=local/crm_resource/2, version=0.309.18)
Feb 10 21:47:02 [2248] ga1-ext        cib:     info: 
cib_process_request:     Forwarding cib_delete operation for section 
constraints to master (origin=local/crm_resource/3)
Feb 10 21:47:02 [2248] ga1-ext        cib:     info: 
cib_process_request:     Completed cib_apply_diff operation for section 
constraints: OK (rc=0, origin=ga2-ext/crm_resource/3, version=0.310.1)
Feb 10 21:47:02 [2249] ga1-ext stonith-ng:     info: 
update_cib_stonith_devices:     Updating device list from the cib: new 
location constraint
Feb 10 21:47:02 [2249] ga1-ext stonith-ng:   notice: unpack_config:     
On loss of CCM Quorum: Ignore
Feb 10 21:47:02 [2248] ga1-ext        cib:     info: crm_client_destroy: 
    Destroying 0 events
Feb 10 21:47:02 [2248] ga1-ext        cib:     info: write_cib_contents: 
    Archived previous version as /var/lib/heartbeat/crm/cib-45.raw
Feb 10 21:47:02 [2248] ga1-ext        cib:     info: write_cib_contents: 
    Wrote version 0.310.0 of the CIB to disk (digest: 
7d257caa36077afded22a7b5b47e27e5)
Feb 10 21:47:02 [2248] ga1-ext        cib:     info: retrieveCib:     
Reading cluster configuration from: /var/lib/heartbeat/crm/cib.nAfCBw 
(digest: /var/lib/heartbeat/crm/cib.Twax6l)

[root@ga1-ext ~]# zcat /var/log/cluster/corosync.log-20140212.gz |head
Feb 11 23:26:01 corosync [TOTEM ] The token was lost in the OPERATIONAL 
state.
Feb 11 23:26:01 corosync [TOTEM ] A processor failed, forming new 
configuration.
Feb 11 23:26:01 corosync [TOTEM ] Receive multicast socket recv buffer 
size (249856 bytes).
Feb 11 23:26:01 corosync [TOTEM ] Transmit multicast socket send buffer 
size (249856 bytes).
Feb 11 23:26:01 corosync [TOTEM ] Local receive multicast loop socket 
recv buffer size (249856 bytes).
Feb 11 23:26:01 corosync [TOTEM ] Local transmit multicast loop socket 
send buffer size (249856 bytes).
Feb 11 23:26:01 corosync [TOTEM ] entering GATHER state from 2.
Feb 11 23:26:03 corosync [TOTEM ] entering GATHER state from 0.
Feb 11 23:26:03 corosync [TOTEM ] Creating commit token because I am the 
rep.
Feb 11 23:26:03 corosync [TOTEM ] Saving state aru 14a high seq received 14a

so strange
tonight I can try another full backup and resend you log files

Anyway, give a try to increase token timeout to value like 10. It looks
like you have 2 nodes and by default token timeout is 1 there. 10 is
used for 3 and more nodes. Also I'm unsure if this was not changed
between 6.3 and 6.4.

Just use

<totem token="X" consensus="X + 2000" />

where X is like 10000.

ok, I'll try this config change
on corosync.conf from centos 6.3 era I have
token: 3000
consensus: 5000

Regards,
   Honza

Alessandro Bono napsal(a):
Il 10/02/14 15:55, Jan Friesse ha scritto:
Ok, but I would still really like to see log from 6.5 (there were huge
amount of fixes for 6.5).
Hi Honza

I find time to force a full backup
note that I forced a full backup on another vm with lots of data, this
is enough to stress host server and cause error on cluster
attached zipped log file with centos 6.5

thank you

--
Cordiali saluti

Alessandro Bono

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss