On 12/21/2011 11:04 AM, Chris Alexander wrote: > An update in case anyone ever runs into something like this - we had > corosync-notify running on the servers and once we removed that and > restarted the cluster stack, corosync seemed to return to normal. > > Additionally, according to the corosync mailing list, the cluster 1.2.3 > version is basically very similar to (if not the same as) the 1.4 that > they currently have released, someone's been backporting. > The upstream 1.2.3 version hasn't had any backports applied to it. Only the RHEL 1.2.3-z versions have been backported. Regards -steve > Cheers > > Chris > > On 19 December 2011 19:01, Chris Alexander <chris.alexander@xxxxxxxxxx > <mailto:chris.alexander@xxxxxxxxxx>> wrote: > > Hi all, > > You may remember our recent issue, I believe this is being worsened > if not caused by another problem we have encountered. > > Every few days our nodes are (non-simultaneously) being fenced due > to corosync taking up vast amounts of memory (i.e. 100% of the box). > Please see a sample log message, we have several just like this, [1] > which occurs when this happens. Note that it is not always corosync > being killed - but it is clearly corosync eating all the memory (see > top output from three servers at various times since their last > reboot, [2] [3] [4]). > > The corosync version is 1.2.3: > [g@cluster1 ~]$ corosync -v > Corosync Cluster Engine, version '1.2.3' > Copyright (c) 2006-2009 Red Hat, Inc. > > We had a bit of a dig around and there are a significant number of > bugfix updates which address various segfaults, crashes, memory > leaks etc. in this minor as well as subsequent minor versions. [5] [6] > > We're trialling the Fedora 14 (fc14) RPMs for corosync and > corosynclib (v1.4.2) to see if it fixes the particular issue we are > seeing (i.e. whether or not the memory keeps spiralling way out of > control). > > Has anyone else seen an issue like this, and is there any known way > to debug or fix it? If we can assist debugging by providing further > information, please specify what this is (and, if non-obvious, how > to get it). > > Thanks again for your help > > Chris > > [1] http://pastebin.com/CbyERaRT > [2] http://pastebin.com/uk9ZGL7H > [3] http://pastebin.com/H4w5Zg46 > [4] http://pastebin.com/KPZxL6UB > [5] http://rhn.redhat.com/errata/RHBA-2011-1361.html > [6] http://rhn.redhat.com/errata/RHBA-2011-1515.html > > > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster