Re: Possible memory leak in mon?

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 21 May 2012 11:18:46 -0700

On Fri, May 18, 2012 at 3:07 AM, Vladimir Bashkirtsev
<vladimir@xxxxxxxxxxxxxxx> wrote:
> On 16/05/12 02:43, Gregory Farnum wrote:
>>
>> On Sun, May 6, 2012 at 5:53 PM, Vladimir Bashkirtsev
>> <vladimir@xxxxxxxxxxxxxxx>  wrote:
>>>
>>> On 03/05/12 16:23, Greg Farnum wrote:
>>>>
>>>> On Wednesday, May 2, 2012 at 11:24 PM, Vladimir Bashkirtsev wrote:
>>>>>
>>>>> Greg,
>>>>>
>>>>> Apologies for multiple emails: my mail server is backed by ceph now and
>>>>> it struggled this morning (separate issue). So my mail server reported
>>>>> back to my mailer that sending of email failed when obviously it was
>>>>> not
>>>>> the case.
>>>>
>>>> Interesting — I presume you're using the file system? That's not
>>>> something
>>>> we've heard of anybody doing with Ceph before. :)
>>>>
>>>>> [root@gamma ~]# ceph -s
>>>>> 2012-05-03 15:46:55.640951 mds e2666: 1/1/1 up {0=1=up:active}, 1
>>>>> up:standby
>>>>> 2012-05-03 15:46:55.647106 osd e10728: 6 osds: 6 up, 6 in
>>>>> 2012-05-03 15:46:55.654052 log 2012-05-03 15:46:26.557084 mon.2
>>>>> 172.16.64.202:6789/0 2878 : [INF] mon.2 calling new monitor election
>>>>> 2012-05-03 15:46:55.654425 mon e7: 3 mons at
>>>>> {0=172.16.64.200:6789/0,1=172.16.64.201:6789/0,2=172.16.64.202:6789/0}
>>>>> 2012-05-03 15:46:56.961624 pg v1251669: 600 pgs: 2 creating, 598
>>>>> active+clean; 309 GB data, 963 GB used, 1098 GB / 2145 GB avail
>>>>>
>>>>> Loggin is on but nothing obvious in there: logs quite small. Number of
>>>>> ceph health logged (ceph monitored by nagios and so this record appears
>>>>> every 5 minutes), monitors periodically call for election (different
>>>>> periods between 1 to 15 minutes as it looks). That's it.
>>>>
>>>> Hrm. Generally speaking the monitors shouldn't call for elections unless
>>>> something changes (one of them crashes) or the leader monitor is slowing
>>>> down.
>>>> Can you increase the debug_mon to 20, the debug_ms to 1, and post one of
>>>> the logs somewhere? The "Live Debugging" section of
>>>> http://ceph.com/wiki/Debugging should give you what you need. :)
>>>
>>> Here's the logs and core dumps:
>>> http://www.bashkirtsev.com/logs-2012-05-07.tar.bz2
>>>
>>> Mons grown to 1.2GB and 2GB of memory.
>>
>> When I look at the logs for mon.0, I see that there are a lot of
>> places where mon.0 takes tens of seconds to write something to disk.
>> If the disk is just about full, that might make sense (many
>> filesystems don't handle a nearly-full disk very well at all); and a
>> monitor getting stuck for that long could definitely explain why they
>> start using up so much memory (they're buffering messages). I suspect
>> that there's not anything particularly wrong here, unless I'm
>> misunderstanding the story you're telling me. :) Have you noticed this
>> problem when the monitor's disk partition isn't nearly full?
>> -Greg
>
> I have recreated conditions when mon started to consume more memory:
> everything appears in line with your suspicions. When disk gets almost full,
> mon slows down and finally crashes quite badly so I cannot recover it. I am
> forced then to destroy mon all together and create a new one instead.
>
> Long story short: in docs/wiki it should be stated as recommendation NOT to
> keep monfs on the same partition as ceph log (which can grow quickly) and
> preferably keep it on separate partition all together.

Patches and edits welcome! :)

> In the same time it begs another question: what it recommended partition
> size for monfs?
I'm looking at a cluster about a month old with a 765MB mon data
directory. Most of that (~500 MB) is in the log files, which can be
trimmed manually, and I believe that everything else taking up data
trims itself when stuff's working. So if you're willing to set up a
pseudo-log rotation (or do it yourself on a timer every month or so) a
couple GB should leave you plenty of breathing room.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html