Re: Corosync memory problem

Chris Alexander <chris.alexander@xxxxxxxxxx> · Wed, 21 Dec 2011 18:04:55 +0000

An update in case anyone ever runs into something like this - we had corosync-notify running on the servers and once we removed that and restarted the cluster stack, corosync seemed to return to normal.

Additionally, according to the corosync mailing list, the cluster 1.2.3 version is basically very similar to (if not the same as) the 1.4 that they currently have released, someone's been backporting.

Cheers

Chris

On 19 December 2011 19:01, Chris Alexander <chris.alexander@xxxxxxxxxx> wrote:

Hi all,
You may remember our recent issue, I believe this is being worsened if not caused by another problem we have encountered.

Every few days our nodes are (non-simultaneously) being fenced due to corosync taking up vast amounts of memory (i.e. 100% of the box). Please see a sample log message, we have several just like this, [1] which occurs when this happens. Note that it is not always corosync being killed - but it is clearly corosync eating all the memory (see top output from three servers at various times since their last reboot, [2] [3] [4]).

The corosync version is 1.2.3:
[g@cluster1 ~]$ corosync -v
Corosync Cluster Engine, version '1.2.3'
Copyright (c) 2006-2009 Red Hat, Inc.

We had a bit of a dig around and there are a significant number of bugfix updates which address various segfaults, crashes, memory leaks etc. in this minor as well as subsequent minor versions. [5] [6]

We're trialling the Fedora 14 (fc14) RPMs for corosync and corosynclib (v1.4.2) to see if it fixes the particular issue we are seeing (i.e. whether or not the memory keeps spiralling way out of control).

Has anyone else seen an issue like this, and is there any known way to debug or fix it? If we can assist debugging by providing further information, please specify what this is (and, if non-obvious, how to get it).

Thanks again for your help

Chris

[1] http://pastebin.com/CbyERaRT
[2] http://pastebin.com/uk9ZGL7H

[3] http://pastebin.com/H4w5Zg46
[4] http://pastebin.com/KPZxL6UB
[5] http://rhn.redhat.com/errata/RHBA-2011-1361.html

[6] http://rhn.redhat.com/errata/RHBA-2011-1515.html

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster