Re: question on mon memory usage

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Tue, 26 Feb 2013 11:54:40 -0600

On 02/26/2013 11:43 AM, Sage Weil wrote:
On Tue, 26 Feb 2013, Travis Rhoden wrote:
Interesting script.

My Python process runs out of memory.  I'm wondering if I need to
increase the stack size or something.  Here's what I get when I run
the stock script:

# ./dump_proc_mem.py 29361 > monb_proc_mem.dump
PID = 29361
PASS :
00400000-006d3000 r-xp 00000000 09:02 139077
   /usr/bin/ceph-mon

PASS :
008d3000-008d8000 r--p 002d3000 09:02 139077
   /usr/bin/ceph-mon

OK :
008d8000-008de000 rw-p 002d8000 09:02 139077
   /usr/bin/ceph-mon

start = 9273344

OK :
008de000-01081000 rw-p 00000000 00:00 0

start = 9297920

OK :
02f70000-04bbb000 rw-p 00000000 00:00 0                                  [heap]

start = 49741824

OK :
04bbb000-5b621f000 rw-p 00000000 00:00 0                                 [heap]

start = 79409152
Traceback (most recent call last):
   File "./dump_proc_mem.py", line 27, in <module>
     chunk = mem_file.read(end - start)  # read region contents
MemoryError

It dies in the same place every time.
>From what I can tell of the script, it's going to try and read the
address space 04bbb000-5b621f000, all at once.  Which if my math is
correct, is the full 22GB.  =)  I'm modify the script now to read this
in 1MB chunks or so.

Perfect.  That what I get for cut and pasting from stackoverflow :)

s

Hey now, cut and pasting from stackoverflow is a time-honoured 
tradition! Just be thankful it didn't crash the machine. ;)

Mark

On Tue, Feb 26, 2013 at 1:17 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
On Mon, 25 Feb 2013, Travis Rhoden wrote:
Hi Sage,

I gave that script a try.  Interestingly, I ended up with a core file
from gdb itself.

# file core
core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'gdb --batch --pid 29361 -ex dump memory
29361-04bbb000-57bf58000.dump 0x04bbb00'

So I think gdb crashed.  But before that happened, I did get 195M of
output.  However, I was expecting a full 20+ GBs.  Not sure if what I
generated can be of use or not.  If so, I can tar and compress it all
and place it somewhere useful if you like.  At it's current size, I
could host it in dropbox for you to pull down.  At 20GB (if that had
worked) I would need a place to scp it.

Argh.  Try this:

http://ceph.com/qa/dump_proc_mem.txt

it takes one argument (the pid).. pipe it to a file, bzip2, and post
somewhere.  Hopefully that'll do the trick...

sage

  - Travis

On Mon, Feb 25, 2013 at 8:12 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
On Mon, 25 Feb 2013, Travis Rhoden wrote:
Right now everything is on a stock setup.   I believe that means no core file.

root@ceph2:~# ulimit -c
0

Doh.  I don't see anything in the ceph init script that would increase
this for the ceph-* processes.  Which is probably a good thing, of
course.

Can you try something like this to grab an image of the process memory?

#!/bin/bash
grep rw-p /proc/$1/maps | sed -n 's/^\([0-9a-f]*\)-\([0-9a-f]*\) .*$/\1 \2/p' | while read start stop; do gdb --batch --pid $1 -ex "dump memory $1-$start-$stop.dump 0x$start 0x$stop"; done

(from http://stackoverflow.com/questions/12977179/reading-living-process-memory-without-interrupting-it-proc-kcore-is-an-option)

THanks!
sage

On Mon, Feb 25, 2013 at 7:40 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
On Mon, 25 Feb 2013, Travis Rhoden wrote:
Joao,

Happy to help if I can.  responses inline.

On Mon, Feb 25, 2013 at 4:05 PM, Joao Eduardo Luis
<joao.luis@xxxxxxxxxxx> wrote:
On 02/25/2013 07:59 PM, Travis Rhoden wrote:

Hi folks,

A question about memory usage by the Mon.  I have a cluster that is
being used exclusively for RBD (no CephFS/mds).  I have 5 mons, and
one is slowly but surely using a heck of a lot more memory than the
others:

# for x in ceph{1..5}; do ssh $x 'ps aux | grep ceph-mon | grep -v grep';
done
root     31034  5.2  0.1 312116 75516 ?        Ssl  Feb14 881:51
/usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c
/etc/ceph/ceph.conf
root     29361  4.8 53.9 22526128 22238080 ?   Ssl  Feb14 822:36
/usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c
/tmp/ceph.conf.31144
root     28421  7.0  0.1 273608 88608 ?        Ssl  Feb20 516:48
/usr/bin/ceph-mon -i c --pid-file /var/run/ceph/mon.c.pid -c
/tmp/ceph.conf.10625
root     25876  4.8  0.1 240752 84048 ?        Ssl  Feb14 816:54
/usr/bin/ceph-mon -i d --pid-file /var/run/ceph/mon.d.pid -c
/tmp/ceph.conf.31537
root     24505  4.8  0.1 228720 79284 ?        Ssl  Feb14 818:14
/usr/bin/ceph-mon -i e --pid-file /var/run/ceph/mon.e.pid -c
/tmp/ceph.conf.31734

As you can see, one is up over 20GB, while the others are < 100MB.

Is this normal?  The box has plenty of RAM -- I'm wondering if this is
a memory leak, or if it's just slowly finding more things it can cache
and such.

Hi Travis,

Which version are you running?

# ceph --version
ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)

That's the case all around OSDs, mons, librbd clients, everything in my cluster
This has been something that pops in the list every now and then, and I've
spent a considerable amount of time trying to track it down.

My current suspicion lies on the in-memory pgmap growing, and growing, and
growing... and it usually hits the leader the worst.  Can you please confirm
that mon.b is indeed the leader?
I'm not 100% sure how to do that.  I'm guessing rank 0 from the
following output?

# ceph quorum_status
{ "election_epoch": 32,
   "quorum": [
         0,
         1,
         2,
         3,
         4],
   "monmap": { "epoch": 1,
       "fsid": "d5229b51-5321-48d2-bbb2-16062abb1992",
       "modified": "2013-01-21 17:58:14.389411",
       "created": "2013-01-21 17:58:14.389411",
       "mons": [
             { "rank": 0,
               "name": "a",
               "addr": "10.10.30.1:6789\/0"},
             { "rank": 1,
               "name": "b",
               "addr": "10.10.30.2:6789\/0"},
             { "rank": 2,
               "name": "c",
               "addr": "10.10.30.3:6789\/0"},
             { "rank": 3,
               "name": "d",
               "addr": "10.10.30.4:6789\/0"},
             { "rank": 4,
               "name": "e",
               "addr": "10.10.30.5:6789\/0"}]}}

That would seem to imply that mon a is the leader.  mon b is
definitely the problem child at the moment.

I did a quick check, and mon b has grown by ~ 400MB since my previous
email.  So we're looking at a little under 100MB/hr, perhaps.  Not
sure if that's consistent or not.  WIll certainly check again in the
morning.

Do you know if there is a core file ulimit set on that process?  If the
core is configured to go somewhere, a kill -SEGV on it would generate a
core that would help us figure out what the memory is consumed by.

Thanks!
sage

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com