Re: question on mon memory usage

Travis Rhoden <trhoden@xxxxxxxxx> · Mon, 4 Mar 2013 14:12:14 -0500

Joao,

Were you able to glean anything useful from the memory dump I provided?

The mon did eventually crash, presumable from out of memory (althought
I'm failing to find oom_killer log entries.  still searching).  Here's
what it looked like:

2013-03-02 03:42:54.993851 7ff9e9c97700 -1 mon/MonitorStore.cc: In
function 'void MonitorStore::write_bl_ss(ceph::bufferlist&, const
char*, const char*, bool)' thread 7ff9e9c97700 time 2013-03-02
03:42:54.984980
mon/MonitorStore.cc: 382: FAILED assert(!err)

 ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
 1: (MonitorStore::write_bl_ss(ceph::buffer::list&, char const*, char
const*, bool)+0xe18) [0x528398]
 2: (OSDMonitor::update_from_paxos()+0xabb) [0x4b733b]
 3: (PaxosService::_active()+0x202) [0x4a2492]
 4: (Context::complete(int)+0xa) [0x485a8a]
 5: (finish_contexts(CephContext*, std::list<Context*,
std::allocator<Context*> >&, int)+0x11d) [0x4878cd]
 6: (Paxos::handle_lease(MMonPaxos*)+0x54f) [0x496a5f]
 7: (Paxos::dispatch(PaxosServiceMessage*)+0x28b) [0x49f3ab]
 8: (Monitor::_ms_dispatch(Message*)+0xfdf) [0x484b6f]
 9: (Monitor::ms_dispatch(Message*)+0x32) [0x4945c2]
 10: (DispatchQueue::entry()+0x349) [0x63d009]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x5d67bd]
 12: (()+0x7e9a) [0x7ff9ef082e9a]
 13: (clone()+0x6d) [0x7ff9edf23cbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

Interestingly an OSD went down about the same time (again, I suspect
oom, but can't find it).  Here's the log entry:

2013-03-02 03:42:14.025938 7f32cf88c700 -1 *** Caught signal (Aborted) **
 in thread 7f32cf88c700

 ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
 1: /usr/bin/ceph-osd() [0x78430a]
 2: (()+0xfcb0) [0x7f32e1eb2cb0]
 3: (gsignal()+0x35) [0x7f32e0871425]
 4: (abort()+0x17b) [0x7f32e0874b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f32e11c369d]
 6: (()+0xb5846) [0x7f32e11c1846]
 7: (()+0xb5873) [0x7f32e11c1873]
 8: (()+0xb596e) [0x7f32e11c196e]
 9: (operator new[](unsigned long)+0x47e) [0x7f32e1656b1e]
 10: (ceph::buffer::create(unsigned int)+0x67) [0x82ec47]
 11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x82efb5]
 12: (FileStore::read(coll_t, hobject_t const&, unsigned long,
unsigned long, ceph::buffer::list&)+0x1ae) [0x6f814e]
 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t,
bool)+0x347) [0x6970e7]
 14: (PG::chunky_scrub()+0x375) [0x69bee5]
 15: (PG::scrub()+0x145) [0x69d265]
 16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x63437c]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x823cf6]
 18: (ThreadPool::WorkThread::entry()+0x10) [0x825b20]
 19: (()+0x7e9a) [0x7f32e1eaae9a]
 20: (clone()+0x6d) [0x7f32e092ecbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

 - Travis

On Tue, Feb 26, 2013 at 12:43 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> On Tue, 26 Feb 2013, Travis Rhoden wrote:
>> Interesting script.
>>
>> My Python process runs out of memory.  I'm wondering if I need to
>> increase the stack size or something.  Here's what I get when I run
>> the stock script:
>>
>> # ./dump_proc_mem.py 29361 > monb_proc_mem.dump
>> PID = 29361
>> PASS :
>> 00400000-006d3000 r-xp 00000000 09:02 139077
>>   /usr/bin/ceph-mon
>>
>>
>> PASS :
>> 008d3000-008d8000 r--p 002d3000 09:02 139077
>>   /usr/bin/ceph-mon
>>
>>
>> OK :
>> 008d8000-008de000 rw-p 002d8000 09:02 139077
>>   /usr/bin/ceph-mon
>>
>> start = 9273344
>>
>> OK :
>> 008de000-01081000 rw-p 00000000 00:00 0
>>
>> start = 9297920
>>
>> OK :
>> 02f70000-04bbb000 rw-p 00000000 00:00 0                                  [heap]
>>
>> start = 49741824
>>
>> OK :
>> 04bbb000-5b621f000 rw-p 00000000 00:00 0                                 [heap]
>>
>> start = 79409152
>> Traceback (most recent call last):
>>   File "./dump_proc_mem.py", line 27, in <module>
>>     chunk = mem_file.read(end - start)  # read region contents
>> MemoryError
>>
>> It dies in the same place every time.
>> >From what I can tell of the script, it's going to try and read the
>> address space 04bbb000-5b621f000, all at once.  Which if my math is
>> correct, is the full 22GB.  =)  I'm modify the script now to read this
>> in 1MB chunks or so.
>
> Perfect.  That what I get for cut and pasting from stackoverflow :)
>
> s
>
>>
>> On Tue, Feb 26, 2013 at 1:17 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>> > On Mon, 25 Feb 2013, Travis Rhoden wrote:
>> >> Hi Sage,
>> >>
>> >> I gave that script a try.  Interestingly, I ended up with a core file
>> >> from gdb itself.
>> >>
>> >> # file core
>> >> core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
>> >> from 'gdb --batch --pid 29361 -ex dump memory
>> >> 29361-04bbb000-57bf58000.dump 0x04bbb00'
>> >>
>> >> So I think gdb crashed.  But before that happened, I did get 195M of
>> >> output.  However, I was expecting a full 20+ GBs.  Not sure if what I
>> >> generated can be of use or not.  If so, I can tar and compress it all
>> >> and place it somewhere useful if you like.  At it's current size, I
>> >> could host it in dropbox for you to pull down.  At 20GB (if that had
>> >> worked) I would need a place to scp it.
>> >
>> > Argh.  Try this:
>> >
>> > http://ceph.com/qa/dump_proc_mem.txt
>> >
>> > it takes one argument (the pid).. pipe it to a file, bzip2, and post
>> > somewhere.  Hopefully that'll do the trick...
>> >
>> > sage
>> >
>> >
>> >>
>> >>  - Travis
>> >>
>> >> On Mon, Feb 25, 2013 at 8:12 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>> >> > On Mon, 25 Feb 2013, Travis Rhoden wrote:
>> >> >> Right now everything is on a stock setup.   I believe that means no core file.
>> >> >>
>> >> >> root@ceph2:~# ulimit -c
>> >> >> 0
>> >> >>
>> >> >> Doh.  I don't see anything in the ceph init script that would increase
>> >> >> this for the ceph-* processes.  Which is probably a good thing, of
>> >> >> course.
>> >> >
>> >> > Can you try something like this to grab an image of the process memory?
>> >> >
>> >> > #!/bin/bash
>> >> > grep rw-p /proc/$1/maps | sed -n 's/^\([0-9a-f]*\)-\([0-9a-f]*\) .*$/\1 \2/p' | while read start stop; do gdb --batch --pid $1 -ex "dump memory $1-$start-$stop.dump 0x$start 0x$stop"; done
>> >> >
>> >> > (from http://stackoverflow.com/questions/12977179/reading-living-process-memory-without-interrupting-it-proc-kcore-is-an-option)
>> >> >
>> >> > THanks!
>> >> > sage
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >> On Mon, Feb 25, 2013 at 7:40 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>> >> >> > On Mon, 25 Feb 2013, Travis Rhoden wrote:
>> >> >> >> Joao,
>> >> >> >>
>> >> >> >> Happy to help if I can.  responses inline.
>> >> >> >>
>> >> >> >> On Mon, Feb 25, 2013 at 4:05 PM, Joao Eduardo Luis
>> >> >> >> <joao.luis@xxxxxxxxxxx> wrote:
>> >> >> >> > On 02/25/2013 07:59 PM, Travis Rhoden wrote:
>> >> >> >> >>
>> >> >> >> >> Hi folks,
>> >> >> >> >>
>> >> >> >> >> A question about memory usage by the Mon.  I have a cluster that is
>> >> >> >> >> being used exclusively for RBD (no CephFS/mds).  I have 5 mons, and
>> >> >> >> >> one is slowly but surely using a heck of a lot more memory than the
>> >> >> >> >> others:
>> >> >> >> >>
>> >> >> >> >> # for x in ceph{1..5}; do ssh $x 'ps aux | grep ceph-mon | grep -v grep';
>> >> >> >> >> done
>> >> >> >> >> root     31034  5.2  0.1 312116 75516 ?        Ssl  Feb14 881:51
>> >> >> >> >> /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c
>> >> >> >> >> /etc/ceph/ceph.conf
>> >> >> >> >> root     29361  4.8 53.9 22526128 22238080 ?   Ssl  Feb14 822:36
>> >> >> >> >> /usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c
>> >> >> >> >> /tmp/ceph.conf.31144
>> >> >> >> >> root     28421  7.0  0.1 273608 88608 ?        Ssl  Feb20 516:48
>> >> >> >> >> /usr/bin/ceph-mon -i c --pid-file /var/run/ceph/mon.c.pid -c
>> >> >> >> >> /tmp/ceph.conf.10625
>> >> >> >> >> root     25876  4.8  0.1 240752 84048 ?        Ssl  Feb14 816:54
>> >> >> >> >> /usr/bin/ceph-mon -i d --pid-file /var/run/ceph/mon.d.pid -c
>> >> >> >> >> /tmp/ceph.conf.31537
>> >> >> >> >> root     24505  4.8  0.1 228720 79284 ?        Ssl  Feb14 818:14
>> >> >> >> >> /usr/bin/ceph-mon -i e --pid-file /var/run/ceph/mon.e.pid -c
>> >> >> >> >> /tmp/ceph.conf.31734
>> >> >> >> >>
>> >> >> >> >> As you can see, one is up over 20GB, while the others are < 100MB.
>> >> >> >> >>
>> >> >> >> >> Is this normal?  The box has plenty of RAM -- I'm wondering if this is
>> >> >> >> >> a memory leak, or if it's just slowly finding more things it can cache
>> >> >> >> >> and such.
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > Hi Travis,
>> >> >> >> >
>> >> >> >> > Which version are you running?
>> >> >> >> >
>> >> >> >> # ceph --version
>> >> >> >> ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
>> >> >> >>
>> >> >> >> That's the case all around OSDs, mons, librbd clients, everything in my cluster
>> >> >> >> > This has been something that pops in the list every now and then, and I've
>> >> >> >> > spent a considerable amount of time trying to track it down.
>> >> >> >> >
>> >> >> >> > My current suspicion lies on the in-memory pgmap growing, and growing, and
>> >> >> >> > growing... and it usually hits the leader the worst.  Can you please confirm
>> >> >> >> > that mon.b is indeed the leader?
>> >> >> >> I'm not 100% sure how to do that.  I'm guessing rank 0 from the
>> >> >> >> following output?
>> >> >> >>
>> >> >> >> # ceph quorum_status
>> >> >> >> { "election_epoch": 32,
>> >> >> >>   "quorum": [
>> >> >> >>         0,
>> >> >> >>         1,
>> >> >> >>         2,
>> >> >> >>         3,
>> >> >> >>         4],
>> >> >> >>   "monmap": { "epoch": 1,
>> >> >> >>       "fsid": "d5229b51-5321-48d2-bbb2-16062abb1992",
>> >> >> >>       "modified": "2013-01-21 17:58:14.389411",
>> >> >> >>       "created": "2013-01-21 17:58:14.389411",
>> >> >> >>       "mons": [
>> >> >> >>             { "rank": 0,
>> >> >> >>               "name": "a",
>> >> >> >>               "addr": "10.10.30.1:6789\/0"},
>> >> >> >>             { "rank": 1,
>> >> >> >>               "name": "b",
>> >> >> >>               "addr": "10.10.30.2:6789\/0"},
>> >> >> >>             { "rank": 2,
>> >> >> >>               "name": "c",
>> >> >> >>               "addr": "10.10.30.3:6789\/0"},
>> >> >> >>             { "rank": 3,
>> >> >> >>               "name": "d",
>> >> >> >>               "addr": "10.10.30.4:6789\/0"},
>> >> >> >>             { "rank": 4,
>> >> >> >>               "name": "e",
>> >> >> >>               "addr": "10.10.30.5:6789\/0"}]}}
>> >> >> >>
>> >> >> >> That would seem to imply that mon a is the leader.  mon b is
>> >> >> >> definitely the problem child at the moment.
>> >> >> >>
>> >> >> >> I did a quick check, and mon b has grown by ~ 400MB since my previous
>> >> >> >> email.  So we're looking at a little under 100MB/hr, perhaps.  Not
>> >> >> >> sure if that's consistent or not.  WIll certainly check again in the
>> >> >> >> morning.
>> >> >> >
>> >> >> > Do you know if there is a core file ulimit set on that process?  If the
>> >> >> > core is configured to go somewhere, a kill -SEGV on it would generate a
>> >> >> > core that would help us figure out what the memory is consumed by.
>> >> >> >
>> >> >> > Thanks!
>> >> >> > sage
>> >> >> >
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com