Re: pacemaker "CPG API: failed Library error"

Jan Friesse <jfriesse@xxxxxxxxxx> · Fri, 21 Feb 2014 13:44:16 +0100

Alessandro Bono napsal(a):

Il 21/02/14 12:12, Jan Friesse ha scritto:
Alessandro,
actually everything behaves perfectly as it should. As it can be seen
from logged message "Feb 21 04:41:33 corosync [CMAN  ] memb: Sending
KILL to node 2" cman om nespolo-ext is killing fico-mail cman/corosync
and this results in Library error (2) on fico-mail. This is perfectly
valid. Sorry I didn't noticed this in previous log.

There is nothing corosync can do with this problem.

I can only recommend you to put backups in cgroups and lower io/cpu
speed as much as possible + increase token timeout. Together with
properly configured fencing, worst thing which can happen is that from
time to time (depending on how well you will be able to set token
timeout) fico-mail VM will be fenced, restarted and then rejoined
cluster. Downtime should be minimal.

ok I'll try to workaround this situation as you suggested
but why second node try to kill primary? is it configurable in some way?

I'm really not expert in this cman deeps, so CC'ing chrissie (who is 
cman master ;) )

Honestly for me it is a regression from previous corosync behaviour

Actually, corosync behaviour didn't changed. Cman is sending kill node, 
not corosync.

thank you for your support

Regards,
  Honza

Alessandro Bono napsal(a):
Hi Honza

attached log from another cluster

on primary node

grep Library corosync-fico-mail-20140221.log
Feb 21 04:41:33 [27122] fico-mail       crmd:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library
error (2)
Feb 21 04:41:33 [27120] fico-mail      attrd:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library
error (2)
Feb 21 04:41:33 [27118] fico-mail stonith-ng:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library
error (2)
Feb 21 04:41:33 [27117] fico-mail        cib:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library
error (2)
Feb 21 04:41:35 [27111] fico-mail pacemakerd:     info: crm_cs_flush:
     Sent 0 CPG messages  (1 remaining, last=10): Library error (2)
Feb 21 04:41:35 [27111] fico-mail pacemakerd:    error:
pcmk_cpg_dispatch:     Connection to the CPG API failed: Library
error (2)

same story on secondary node

egrep "pause|scheduled" corosync-nespolo-ext-20140221.log
Feb 21 04:41:27 corosync [TOTEM ] Process pause detected for 5011 ms,
flushing membership messages.
Feb 21 04:41:27 corosync [MAIN  ] Corosync main process was not
scheduled for 8759.2314 ms (threshold is 2400.0000 ms). Consider token
timeout increase.
Feb 21 04:41:38 corosync [TOTEM ] Process pause detected for 1955 ms,
flushing membership messages.

secondary node is on an old and slow host used for backup and it's not
easy to solve perfomance problem

  cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="1" name="mail_cluster">
     <cman two_node="1" expected_votes="1"/>
     <totem token="3000" consensus="5000" />
     <logging>
         <logging_daemon name="corosync" debug="on"/>
     </logging>
     <clusternodes>
         <clusternode name="nespolo-ext" nodeid="1"/>
         <clusternode name="fico-mail" nodeid="2"/>
     </clusternodes>
</cluster>

crm configure show
node fico-mail
node nespolo-ext \
     attributes standby="off"
primitive ClusterIP ocf:heartbeat:IPaddr \
     params ip="10.153.24.4" cidr_netmask="24" \
     op monitor interval="30s"
primitive SharedFS ocf:heartbeat:Filesystem \
     params device="/dev/drbd/by-res/r0" directory="/shared"
fstype="ext4" options="noatime,nobarrier"
primitive drbd0 ocf:linbit:drbd \
     params drbd_resource="r0" \
     op monitor interval="15s"
primitive drbdlinks ocf:tummy:drbdlinks \
     meta target-role="Started"
primitive mysql lsb:mysqld
group service_group ClusterIP SharedFS drbdlinks mysql
ms ms_drbd0 drbd0 \
     meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location prefer-master service_group 1: fico-mail
colocation service_on_drbd inf: service_group ms_drbd0:Master
order service_after_drbd inf: ms_drbd0:promote service_group:start
property $id="cib-bootstrap-options" \
     dc-version="1.1.10-14.el6_5.2-368c726" \
     cluster-infrastructure="cman" \
     expected-quorum-votes="2" \
     stonith-enabled="false" \
     no-quorum-policy="ignore" \
     last-lrm-refresh="1392973313" \
     maintenance-mode="false"

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss