Re: Possible memory leak in mon?

Vladimir Bashkirtsev <vladimir@xxxxxxxxxxxxxxx> · Thu, 03 May 2012 15:54:16 +0930

Greg,

Apologies for multiple emails: my mail server is backed by ceph now and 
it struggled this morning (separate issue). So my mail server reported 
back to my mailer that sending of email failed when obviously it was not 
the case.

[root@gamma ~]# ceph -s
2012-05-03 15:46:55.640951   mds e2666: 1/1/1 up {0=1=up:active}, 1 
up:standby
2012-05-03 15:46:55.647106   osd e10728: 6 osds: 6 up, 6 in
2012-05-03 15:46:55.654052   log 2012-05-03 15:46:26.557084 mon.2 
172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
2012-05-03 15:46:55.654425   mon e7: 3 mons at 
{0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
2012-05-03 15:46:56.961624    pg v1251669: 600 pgs: 2 creating, 598 
active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail

Loggin is on but nothing obvious in there: logs quite small. Number of 
ceph health logged (ceph monitored by nagios and so this record appears 
every 5 minutes), monitors periodically call for election (different 
periods between 1 to 15 minutes as it looks). That's it.

Regards,
Vladimir

On 03/05/12 09:52, Greg Farnum wrote:
On Wednesday, May 2, 2012 at 3:28 PM, Vladimir Bashkirtsev wrote:
Dear devs,

I have three mons and two of them suddenly consumed around 4G of RAM
while third one happily lived with 150M. This immediately prompts few
questions:

1. What is expected memory use of mon? I believed that mon merely
directs clients to relevant OSDs and should not consume a lot of
resources - please correct me if I am wrong.
2. In both cases where mon consumed a lot of memory it was preceded by
disk-full condition and both machines where incidents happened are 64
bit, rest of cluster 32 bit. mon fs and log files happened to be in the
same partition - ceph osd produced a lot of messages, filled up disk,
mon crashed (no core as disk was full), manually deleted logs, restarted
mon without any issue, some time later found mon using 4G of RAM.
Running 0.45. Should I deliberately recreate conditions and crash mon to
get more debug info (if you need it of course, and if yes then what)?
3. Does figure 4G per process coming from 32 bit pointers in mon? Or mon
potentially can consume more than 4G?

Regards,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx)
More majordomo info at http://vger.kernel.org/majordomo-info.html
First: one email is enough.

Second: in normal use your monitors should not consume very much memory. It sounds like something's wrong. Can you please provide the output of "ceph -s"?
Also, do you have any monitor logging on? My best guess is that for some reason the monitors aren't all communicating with each other and so they are buffering messages.
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html