On 12/20/2011 03:12 AM, Chris Alexander wrote: > Hi all, > > We are using Corosync as part of the Redhat cluster stack. Their > currently supported version is 1.2.3. > While Red Hat's corosync is "version 1.2.3" the z stream almost entirely matches the flatiron 1.4 branch. I take patches and apply them to the RPM. > Every few days our nodes are (non-simultaneously) being fenced due to > corosync taking up vast amounts of memory (i.e. 100% of the box). Please > see a sample log message, we have several just like this, [1] which > occurs when this happens. Note that it is not always corosync being > killed - but it is clearly corosync eating all the memory (see top > output from three servers at various times since their last reboot, [2] > [3] [4]). > > The corosync version is 1.2.3: > [g@cluster1 ~]$ corosync -v > Corosync Cluster Engine, version '1.2.3' > Copyright (c) 2006-2009 Red Hat, Inc. > > We had a bit of a dig around and there are a significant number of > bugfix updates which address various segfaults, crashes, memory leaks > etc. in this minor as well as subsequent minor versions. [5] [6] However > it seems the Redhat repos haven't been updated past 1.2.3 as yet. > > We're trialling the Fedora 14 (fc14) RPMs for corosync and corosynclib > (v1.4.2) to see if it fixes the particular issue we are seeing (i.e. > whether or not the memory keeps spiralling way out of control). > The latest z stream would be your best solution here. > Has anyone else seen an issue like this, and is there any known way to > debug or fix it? If we can assist debugging by providing further > information, please specify what this is (and, if non-obvious, how to > get it). Any additional tips also welcome. > I haven't seen this problem in the field. Please report to it to support. They may have seen it and can map it to a BZ, or if not help reproduce it and get it fixed. Regards -steve > Thanks again for your help > > Chris > > [1] http://pastebin.com/CbyERaRT > [2] http://pastebin.com/uk9ZGL7H > [3] http://pastebin.com/H4w5Zg46 > [4] http://pastebin.com/KPZxL6UB > [5] http://rhn.redhat.com/errata/RHBA-2011-1361.html > [6] http://rhn.redhat.com/errata/RHBA-2011-1515.html > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss