Hello List,
I have been using Corosync with Pacemaker for almost a year in many
different production systems, so far I haven't hit any problems, but now
I hit something that causes me trouble:
I have a 3-node PostgreSQL cluster set up (two actual database nodes
and one witness server). It took me quite some time to get this setup
configured well, because we are using Ubuntu 14.04 LTS and it does
everything a little bit different, but in the end I was able to create a
cluster that worked well, data was replicated master-slave roles
established, failover happened seamlessly so I went home. Next day in
the morning the whole cluster was dead. Analyzing the logs it turned out
that it run out of memory during the night. Since then I have been
monitoring all the nodes, and it turns out that Corosync is the one
responsible for this: I have a node that has been running for around 1
hour, and at the beginning Corosync used around 8% of memory. Now, after
just one hour it is already using 25% RAM. It is easy to see that in
some hours it is going to crash the node.
Interesting thing is that all other nodes (Apache loadbalancers mainly)
don't have that problem despite the fact they are running almost the
same setup: Ubuntu 14.04 LTS, Corosync 2.3.3, Pacemaker 1.1.10 - all
from the official Ubuntu repositories. On the psql nodes I had to switch
to the latest libqb (0.17) because the official one (0.16) caused the
whole cluster to freeze with 100% CPU usage. So except for the libqb
version all nodes are the same, they just run different resources, the
Apache ones are fine after many days of running, the postgres nodes are
not as I said.
I would really like to fix this issue if possible, so I'm open to any
ideas or suggestions.
thank you!
Domonkos
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss