On 12/21/2011 02:10 PM, Angus Salkeld wrote: > On 21/12/11 18:01 +0000, Chris Alexander wrote: >> After some tinkering with this we found the problem seemed to be >> caused by >> corosync-notify. We had it running in the background, and by simply >> killing >> it and restarting the server (after making sure it wouldn't start again) >> the memory usage remained constant. >> >> If anyone has further thoughts on this or wants us to provide some >> debugging information then please get in touch and we'll be happy to help >> as far as we can. > > Here is the fix for your bug: (it is in flatiron) > https://github.com/corosync/corosync/commit/9ddb845f412531b6a2761f42823b6be43216a9c8 > > > -Angus > The fix for this problem is also in RHEL. Contact your support rep. Regards -steve >> >> Cheers >> >> Chris >> >> On 20 December 2011 15:11, Steven Dake <sdake@xxxxxxxxxx> wrote: >> >>> On 12/20/2011 03:12 AM, Chris Alexander wrote: >>> > Hi all, >>> > >>> > We are using Corosync as part of the Redhat cluster stack. Their >>> > currently supported version is 1.2.3. >>> > >>> >>> While Red Hat's corosync is "version 1.2.3" the z stream almost entirely >>> matches the flatiron 1.4 branch. I take patches and apply them to >>> the RPM. >>> >>> > Every few days our nodes are (non-simultaneously) being fenced due to >>> > corosync taking up vast amounts of memory (i.e. 100% of the box). >>> Please >>> > see a sample log message, we have several just like this, [1] which >>> > occurs when this happens. Note that it is not always corosync being >>> > killed - but it is clearly corosync eating all the memory (see top >>> > output from three servers at various times since their last reboot, >>> [2] >>> > [3] [4]). >>> > >>> > The corosync version is 1.2.3: >>> > [g@cluster1 ~]$ corosync -v >>> > Corosync Cluster Engine, version '1.2.3' >>> > Copyright (c) 2006-2009 Red Hat, Inc. >>> > >>> > We had a bit of a dig around and there are a significant number of >>> > bugfix updates which address various segfaults, crashes, memory leaks >>> > etc. in this minor as well as subsequent minor versions. [5] [6] >>> However >>> > it seems the Redhat repos haven't been updated past 1.2.3 as yet. >>> > >>> > We're trialling the Fedora 14 (fc14) RPMs for corosync and corosynclib >>> > (v1.4.2) to see if it fixes the particular issue we are seeing (i.e. >>> > whether or not the memory keeps spiralling way out of control). >>> > >>> >>> The latest z stream would be your best solution here. >>> >>> > Has anyone else seen an issue like this, and is there any known way to >>> > debug or fix it? If we can assist debugging by providing further >>> > information, please specify what this is (and, if non-obvious, how to >>> > get it). Any additional tips also welcome. >>> > >>> >>> I haven't seen this problem in the field. Please report to it to >>> support. They may have seen it and can map it to a BZ, or if not help >>> reproduce it and get it fixed. >>> >>> Regards >>> -steve >>> >>> > Thanks again for your help >>> > >>> > Chris >>> > >>> > [1] http://pastebin.com/CbyERaRT >>> > [2] http://pastebin.com/uk9ZGL7H >>> > [3] http://pastebin.com/H4w5Zg46 >>> > [4] http://pastebin.com/KPZxL6UB >>> > [5] http://rhn.redhat.com/errata/RHBA-2011-1361.html >>> > [6] http://rhn.redhat.com/errata/RHBA-2011-1515.html >>> > >>> > >>> > _______________________________________________ >>> > discuss mailing list >>> > discuss@xxxxxxxxxxxx >>> > http://lists.corosync.org/mailman/listinfo/discuss >>> >>> > >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss