Re: Corosync 2.3.3 memory leak

"Tomcsányi, Domonkos" <tomcsanyid@xxxxxxxx> · Wed, 06 Aug 2014 12:51:45 +0200

Hello Everyone,

I think I might have isolated the problem!

Starting from this thread:
http://forum.proxmox.com/threads/14263-Proxmox-3-0-Cluster-corosync-running-system-out-of-memory

I became suspicous and started to look at my syslog (IP address 
intentionally changed):

Aug  6 12:46:41 db-01 corosync[22339]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Aug  6 12:46:45 db-01 corosync[22339]:   [TOTEM ] A new membership 
(1.2.3.4:591376) was formed. Members
Aug  6 12:46:45 db-01 corosync[22339]:   [QUORUM] Members[1]: 171707020
Aug  6 12:46:45 db-01 corosync[22339]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Aug  6 12:46:49 db-01 corosync[22339]:   [TOTEM ] A new membership 
(1.2.3.4:591380) was formed. Members
Aug  6 12:46:49 db-01 corosync[22339]:   [QUORUM] Members[1]: 171707020
Aug  6 12:46:49 db-01 corosync[22339]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Aug  6 12:46:53 db-01 corosync[22339]:   [TOTEM ] A new membership 
(1.2.3.4:591384) was formed. Members
Aug  6 12:46:53 db-01 corosync[22339]:   [QUORUM] Members[1]: 171707020
Aug  6 12:46:53 db-01 corosync[22339]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Aug  6 12:46:56 db-01 corosync[22339]:   [TOTEM ] A new membership 
(1.2.3.4:591388) was formed. Members
Aug  6 12:46:56 db-01 corosync[22339]:   [QUORUM] Members[1]: 171707020
Aug  6 12:46:56 db-01 corosync[22339]:   [MAIN  ] Completed service 
synchronization, ready to provide service.

Looking at my other setup I don't see any messages like this. So, the 
constant re-forming of the cluster is causing corosync to eat up all the 
memory. Now I will start investigating on the network level to see, what 
exactly happens there, why is there a constant changing in the cluster, 
but still as the thread mentioned above says I think it shouldn't cause 
such leakage of memory.

regards,
Domonkos

2014.07.31. 11:37 keltezéssel, Jan Friesse írta:
Domonkos,

2014.07.30. 18:10 keltezéssel, "Tomcsányi, Domonkos" írta:
2014.07.30. 15:51 keltezéssel, Jan Friesse wrote:
ok. I was trying reproduce your bug, sadly I was not very successful.

Can you please try to reconfigure your postgres nodes to similar
configuration like on your apache nodes? This will help me to
identify if problem is happening with postgres resource only, or with
all resources and it's problem in corosync/libqb.

Thanks,
  Honza

Well, I did my best: I put the nodes into standby, so no resources run
on them - no change at all, corosync still eats memory heavily.
I think it leaves us not much doubt about what is causing it.

So here is a way to reproduce it: install Ubuntu 14.04 LTS, install
0.17 libqb either from a PPA, or by compiling it.

I will create now a clean virtual machine without any resources and
see if the same happens.

Domonkos

Couldn't reproduce the issue yet in my clean virtual machines, so I'm
gonna leave corosync running inside of valgrind on the machines I had
problems for a night and see what happens.

Perfect. Hopefully you will be able to find out reproducer.

Regards,
  Honza

Domonkos

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss