Re: pacemaker "CPG API: failed Library error"

Jan Friesse <jfriesse@xxxxxxxxxx> · Fri, 21 Feb 2014 12:12:29 +0100

Alessandro,
actually everything behaves perfectly as it should. As it can be seen 
from logged message "Feb 21 04:41:33 corosync [CMAN  ] memb: Sending 
KILL to node 2" cman om nespolo-ext is killing fico-mail cman/corosync 
and this results in Library error (2) on fico-mail. This is perfectly 
valid. Sorry I didn't noticed this in previous log.

There is nothing corosync can do with this problem.

I can only recommend you to put backups in cgroups and lower io/cpu 
speed as much as possible + increase token timeout. Together with 
properly configured fencing, worst thing which can happen is that from 
time to time (depending on how well you will be able to set token 
timeout) fico-mail VM will be fenced, restarted and then rejoined 
cluster. Downtime should be minimal.

Regards,
  Honza

Alessandro Bono napsal(a):
Hi Honza

attached log from another cluster

on primary node

grep Library corosync-fico-mail-20140221.log
Feb 21 04:41:33 [27122] fico-mail       crmd:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library error (2)
Feb 21 04:41:33 [27120] fico-mail      attrd:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library error (2)
Feb 21 04:41:33 [27118] fico-mail stonith-ng:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library error (2)
Feb 21 04:41:33 [27117] fico-mail        cib:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library error (2)
Feb 21 04:41:35 [27111] fico-mail pacemakerd:     info: crm_cs_flush:
     Sent 0 CPG messages  (1 remaining, last=10): Library error (2)
Feb 21 04:41:35 [27111] fico-mail pacemakerd:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library error (2)

same story on secondary node

egrep "pause|scheduled" corosync-nespolo-ext-20140221.log
Feb 21 04:41:27 corosync [TOTEM ] Process pause detected for 5011 ms,
flushing membership messages.
Feb 21 04:41:27 corosync [MAIN  ] Corosync main process was not
scheduled for 8759.2314 ms (threshold is 2400.0000 ms). Consider token
timeout increase.
Feb 21 04:41:38 corosync [TOTEM ] Process pause detected for 1955 ms,
flushing membership messages.

secondary node is on an old and slow host used for backup and it's not
easy to solve perfomance problem

  cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="1" name="mail_cluster">
     <cman two_node="1" expected_votes="1"/>
     <totem token="3000" consensus="5000" />
     <logging>
         <logging_daemon name="corosync" debug="on"/>
     </logging>
     <clusternodes>
         <clusternode name="nespolo-ext" nodeid="1"/>
         <clusternode name="fico-mail" nodeid="2"/>
     </clusternodes>
</cluster>

crm configure show
node fico-mail
node nespolo-ext \
     attributes standby="off"
primitive ClusterIP ocf:heartbeat:IPaddr \
     params ip="10.153.24.4" cidr_netmask="24" \
     op monitor interval="30s"
primitive SharedFS ocf:heartbeat:Filesystem \
     params device="/dev/drbd/by-res/r0" directory="/shared"
fstype="ext4" options="noatime,nobarrier"
primitive drbd0 ocf:linbit:drbd \
     params drbd_resource="r0" \
     op monitor interval="15s"
primitive drbdlinks ocf:tummy:drbdlinks \
     meta target-role="Started"
primitive mysql lsb:mysqld
group service_group ClusterIP SharedFS drbdlinks mysql
ms ms_drbd0 drbd0 \
     meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location prefer-master service_group 1: fico-mail
colocation service_on_drbd inf: service_group ms_drbd0:Master
order service_after_drbd inf: ms_drbd0:promote service_group:start
property $id="cib-bootstrap-options" \
     dc-version="1.1.10-14.el6_5.2-368c726" \
     cluster-infrastructure="cman" \
     expected-quorum-votes="2" \
     stonith-enabled="false" \
     no-quorum-policy="ignore" \
     last-lrm-refresh="1392973313" \
     maintenance-mode="false"

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss