Re: Corosync 2.3.3 memory leak

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tomcsányi,

Hello List,

I have been using Corosync with Pacemaker for almost a year in many
different production systems, so far I haven't hit any problems, but now
I hit something that causes me trouble:
I have a 3-node PostgreSQL cluster set up  (two actual database nodes
and one witness server). It took me quite some time to get this setup
configured well, because we are using Ubuntu 14.04 LTS and it does
everything a little bit different, but in the end I was able to create a
cluster that worked well, data was replicated master-slave roles
established, failover happened seamlessly so I went home. Next day in
the morning the whole cluster was dead. Analyzing the logs it turned out
that it run out of memory during the night. Since then I have been
monitoring all the nodes, and it turns out that Corosync is the one
responsible for this: I have a node that has been running for around 1
hour, and at the beginning Corosync used around 8% of memory. Now, after
just one hour it is already using 25% RAM. It is easy to see that in
some hours it is going to crash the node.
Interesting thing is that all other nodes (Apache loadbalancers mainly)
don't have that problem despite the fact they are running almost the
same setup: Ubuntu 14.04 LTS, Corosync 2.3.3, Pacemaker 1.1.10 - all
from the official Ubuntu repositories. On the psql nodes I had to switch
to the latest libqb (0.17) because the official one (0.16) caused the

Ok. So you just changed libqb without recompilation of corosync?

whole cluster to freeze with 100% CPU usage. So except for the libqb
version all nodes are the same, they just run different resources, the

This is problem. I mean, if they would be really same and running only different libqb, or different resources, victim would be easy to find.

Apache ones are fine after many days of running, the postgres nodes are
not as I said.

I would really like to fix this issue if possible, so I'm open to any
ideas or suggestions.


Let's focus with theory that victim is libqb (because corosync didn't changed). So this is probably libqb bug = nothing you can change by changing configuration, ...

Can you please try to compile libqb from git (there are some fixes)? If this doesn't help, can you please try to recompile corosync (maybe there is some ABI break in libqb)? Let's see if this helps.

Regards,
  Honza



thank you!
Domonkos
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss





[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux