Re: pacemaker "CPG API: failed Library error"

Jan Friesse <jfriesse@xxxxxxxxxx> · Wed, 12 Feb 2014 14:54:05 +0100

Alessandro Bono napsal(a):
> 
> Il 12/02/14 14:15, Jan Friesse ha scritto:
>> Hi Alessandro,
>> I was looking to log file and it looks like it is starting right after
>> token was lost. Do you have log BEFORE that happen.
> 
> Hi Honza
> 
> I have previous log file but something not worked correctly
> 
> [root@ga1-ext ~]# ls -alF  /var/log/cluster/corosync.log-2014021*
> -rw-rw---- 1 hacluster haclient  28963 Feb 10 03:32
> /var/log/cluster/corosync.log-20140210.gz
> -rw-rw---- 1 hacluster haclient 169899 Feb 11 03:43
> /var/log/cluster/corosync.log-20140211.gz
> -rw-rw---- 1 hacluster haclient  35449 Feb 12 03:24
> /var/log/cluster/corosync.log-20140212.gz
> 
> [root@ga1-ext ~]# zcat /var/log/cluster/corosync.log-20140211.gz |tail
> Feb 10 21:47:02 [2248] ga1-ext        cib:     info: crm_client_new:    
> Connecting 0x280a140 for uid=0 gid=0 pid=2169
> id=308fdcf4-f47b-4afb-96f9-126601ff2573
> Feb 10 21:47:02 [2248] ga1-ext        cib:     info:
> cib_process_request:     Completed cib_query operation for section
> 'all': OK (rc=0, origin=local/crm_resource/2, version=0.309.18)
> Feb 10 21:47:02 [2248] ga1-ext        cib:     info:
> cib_process_request:     Forwarding cib_delete operation for section
> constraints to master (origin=local/crm_resource/3)
> Feb 10 21:47:02 [2248] ga1-ext        cib:     info:
> cib_process_request:     Completed cib_apply_diff operation for section
> constraints: OK (rc=0, origin=ga2-ext/crm_resource/3, version=0.310.1)
> Feb 10 21:47:02 [2249] ga1-ext stonith-ng:     info:
> update_cib_stonith_devices:     Updating device list from the cib: new
> location constraint
> Feb 10 21:47:02 [2249] ga1-ext stonith-ng:   notice: unpack_config:    
> On loss of CCM Quorum: Ignore
> Feb 10 21:47:02 [2248] ga1-ext        cib:     info: crm_client_destroy:
>     Destroying 0 events
> Feb 10 21:47:02 [2248] ga1-ext        cib:     info: write_cib_contents:
>     Archived previous version as /var/lib/heartbeat/crm/cib-45.raw
> Feb 10 21:47:02 [2248] ga1-ext        cib:     info: write_cib_contents:
>     Wrote version 0.310.0 of the CIB to disk (digest:
> 7d257caa36077afded22a7b5b47e27e5)
> Feb 10 21:47:02 [2248] ga1-ext        cib:     info: retrieveCib:    
> Reading cluster configuration from: /var/lib/heartbeat/crm/cib.nAfCBw
> (digest: /var/lib/heartbeat/crm/cib.Twax6l)
> 
> [root@ga1-ext ~]# zcat /var/log/cluster/corosync.log-20140212.gz |head
> Feb 11 23:26:01 corosync [TOTEM ] The token was lost in the OPERATIONAL
> state.
> Feb 11 23:26:01 corosync [TOTEM ] A processor failed, forming new
> configuration.
> Feb 11 23:26:01 corosync [TOTEM ] Receive multicast socket recv buffer
> size (249856 bytes).
> Feb 11 23:26:01 corosync [TOTEM ] Transmit multicast socket send buffer
> size (249856 bytes).
> Feb 11 23:26:01 corosync [TOTEM ] Local receive multicast loop socket
> recv buffer size (249856 bytes).
> Feb 11 23:26:01 corosync [TOTEM ] Local transmit multicast loop socket
> send buffer size (249856 bytes).
> Feb 11 23:26:01 corosync [TOTEM ] entering GATHER state from 2.
> Feb 11 23:26:03 corosync [TOTEM ] entering GATHER state from 0.
> Feb 11 23:26:03 corosync [TOTEM ] Creating commit token because I am the
> rep.
> Feb 11 23:26:03 corosync [TOTEM ] Saving state aru 14a high seq received
> 14a
> 
> so strange
> tonight I can try another full backup and resend you log files
> 
>>
>> Anyway, give a try to increase token timeout to value like 10. It looks
>> like you have 2 nodes and by default token timeout is 1 there. 10 is
>> used for 3 and more nodes. Also I'm unsure if this was not changed
>> between 6.3 and 6.4.
>>
>> Just use
>>
>> <totem token="X" consensus="X + 2000" />
>>
>> where X is like 10000.
> 
> ok, I'll try this config change
> on corosync.conf from centos 6.3 era I have
> token: 3000
> consensus: 5000
> 

Ok. This explains everything. Cman will OVERWRITE all config options
from corosync.conf (or actually, corosync.conf is not read at all if
cman is used) so use <totem token="X" consensus="X + 2000" /> and set X
as 3000 and you should be fine.

Regards,
  Honza

>>
>> Regards,
>>    Honza
>>
>> Alessandro Bono napsal(a):
>>> Il 10/02/14 15:55, Jan Friesse ha scritto:
>>>> Ok, but I would still really like to see log from 6.5 (there were huge
>>>> amount of fixes for 6.5).
>>> Hi Honza
>>>
>>> I find time to force a full backup
>>> note that I forced a full backup on another vm with lots of data, this
>>> is enough to stress host server and cause error on cluster
>>> attached zipped log file with centos 6.5
>>>
>>> thank you
>>>
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss