Re: Memory leak on 1.2.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After some tinkering with this we found the problem seemed to be caused by corosync-notify. We had it running in the background, and by simply killing it and restarting the server (after making sure it wouldn't start again) the memory usage remained constant.

If anyone has further thoughts on this or wants us to provide some debugging information then please get in touch and we'll be happy to help as far as we can. 

Cheers

Chris

On 20 December 2011 15:11, Steven Dake <sdake@xxxxxxxxxx> wrote:
On 12/20/2011 03:12 AM, Chris Alexander wrote:
> Hi all,
>
> We are using Corosync as part of the Redhat cluster stack. Their
> currently supported version is 1.2.3.
>

While Red Hat's corosync is "version 1.2.3" the z stream almost entirely
matches the flatiron 1.4 branch.  I take patches and apply them to the RPM.

> Every few days our nodes are (non-simultaneously) being fenced due to
> corosync taking up vast amounts of memory (i.e. 100% of the box). Please
> see a sample log message, we have several just like this, [1] which
> occurs when this happens. Note that it is not always corosync being
> killed - but it is clearly corosync eating all the memory (see top
> output from three servers at various times since their last reboot, [2]
> [3] [4]).
>
> The corosync version is 1.2.3:
> [g@cluster1 ~]$ corosync -v
> Corosync Cluster Engine, version '1.2.3'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
> We had a bit of a dig around and there are a significant number of
> bugfix updates which address various segfaults, crashes, memory leaks
> etc. in this minor as well as subsequent minor versions. [5] [6] However
> it seems the Redhat repos haven't been updated past 1.2.3 as yet.
>
> We're trialling the Fedora 14 (fc14) RPMs for corosync and corosynclib
> (v1.4.2) to see if it fixes the particular issue we are seeing (i.e.
> whether or not the memory keeps spiralling way out of control).
>

The latest z stream would be your best solution here.

> Has anyone else seen an issue like this, and is there any known way to
> debug or fix it? If we can assist debugging by providing further
> information, please specify what this is (and, if non-obvious, how to
> get it). Any additional tips also welcome.
>

I haven't seen this problem in the field.  Please report to it to
support.  They may have seen it and can map it to a BZ, or if not help
reproduce it and get it fixed.

Regards
-steve
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss


_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux