If anyone has further thoughts on this or wants us to provide some debugging information then please get in touch and we'll be happy to help as far as we can.
Cheers
Chris
On 20 December 2011 15:11, Steven Dake <sdake@xxxxxxxxxx> wrote:
On 12/20/2011 03:12 AM, Chris Alexander wrote:While Red Hat's corosync is "version 1.2.3" the z stream almost entirely
> Hi all,
>
> We are using Corosync as part of the Redhat cluster stack. Their
> currently supported version is 1.2.3.
>
matches the flatiron 1.4 branch. I take patches and apply them to the RPM.
The latest z stream would be your best solution here.
> Every few days our nodes are (non-simultaneously) being fenced due to
> corosync taking up vast amounts of memory (i.e. 100% of the box). Please
> see a sample log message, we have several just like this, [1] which
> occurs when this happens. Note that it is not always corosync being
> killed - but it is clearly corosync eating all the memory (see top
> output from three servers at various times since their last reboot, [2]
> [3] [4]).
>
> The corosync version is 1.2.3:
> [g@cluster1 ~]$ corosync -v
> Corosync Cluster Engine, version '1.2.3'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
> We had a bit of a dig around and there are a significant number of
> bugfix updates which address various segfaults, crashes, memory leaks
> etc. in this minor as well as subsequent minor versions. [5] [6] However
> it seems the Redhat repos haven't been updated past 1.2.3 as yet.
>
> We're trialling the Fedora 14 (fc14) RPMs for corosync and corosynclib
> (v1.4.2) to see if it fixes the particular issue we are seeing (i.e.
> whether or not the memory keeps spiralling way out of control).
>
I haven't seen this problem in the field. Please report to it to
> Has anyone else seen an issue like this, and is there any known way to
> debug or fix it? If we can assist debugging by providing further
> information, please specify what this is (and, if non-obvious, how to
> get it). Any additional tips also welcome.
>
support. They may have seen it and can map it to a BZ, or if not help
reproduce it and get it fixed.
Regards
-steve
> _______________________________________________
> Thanks again for your help
>
> Chris
>
> [1] http://pastebin.com/CbyERaRT
> [2] http://pastebin.com/uk9ZGL7H
> [3] http://pastebin.com/H4w5Zg46
> [4] http://pastebin.com/KPZxL6UB
> [5] http://rhn.redhat.com/errata/RHBA-2011-1361.html
> [6] http://rhn.redhat.com/errata/RHBA-2011-1515.html
>
>
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss