Corosync 2.3.3 memory leak

"Tomcsányi, Domonkos" <tomcsanyid@xxxxxxxx> · Fri, 25 Jul 2014 16:34:48 +0200

Hello List,

I have been using Corosync with Pacemaker for almost a year in many 
different production systems, so far I haven't hit any problems, but now 
I hit something that causes me trouble:
I have a 3-node PostgreSQL cluster set up  (two actual database nodes 
and one witness server). It took me quite some time to get this setup 
configured well, because we are using Ubuntu 14.04 LTS and it does 
everything a little bit different, but in the end I was able to create a 
cluster that worked well, data was replicated master-slave roles 
established, failover happened seamlessly so I went home. Next day in 
the morning the whole cluster was dead. Analyzing the logs it turned out 
that it run out of memory during the night. Since then I have been 
monitoring all the nodes, and it turns out that Corosync is the one 
responsible for this: I have a node that has been running for around 1 
hour, and at the beginning Corosync used around 8% of memory. Now, after 
just one hour it is already using 25% RAM. It is easy to see that in 
some hours it is going to crash the node.
Interesting thing is that all other nodes (Apache loadbalancers mainly) 
don't have that problem despite the fact they are running almost the 
same setup: Ubuntu 14.04 LTS, Corosync 2.3.3, Pacemaker 1.1.10 - all 
from the official Ubuntu repositories. On the psql nodes I had to switch 
to the latest libqb (0.17) because the official one (0.16) caused the 
whole cluster to freeze with 100% CPU usage. So except for the libqb 
version all nodes are the same, they just run different resources, the 
Apache ones are fine after many days of running, the postgres nodes are 
not as I said.
I would really like to fix this issue if possible, so I'm open to any 
ideas or suggestions.
thank you!
Domonkos
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss