Re: question on mon memory usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the suggestion.  No luck from that, though.  THe filesystem
where the mon resides is only 18% full -- still has over 7GB available
(it's ~10G volume).

Unless something cleaned up the directories when I restarted the mon,
that wasn't it.

I'll keep an eye out for that if any mons start climbing again.

 - Travis

On Mon, Mar 4, 2013 at 2:23 PM, Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> wrote:
> On 03/04/2013 07:12 PM, Travis Rhoden wrote:
>>
>> Joao,
>>
>> Were you able to glean anything useful from the memory dump I provided?
>
>
> Hey Travis,
>
> Haven't had the chance to look into the dump, but it's still on my stack to
> go over as soon as I'm able to get into it.
>
>
>>
>> The mon did eventually crash, presumable from out of memory (althought
>> I'm failing to find oom_killer log entries.  still searching).  Here's
>> what it looked like:
>
>
> This looks like a result from an ENOSPC.  You should check if you have any
> available space on the disk where the monitor resides.  It might very well
> be a side-effect of whatever has made your memory consumption go through the
> roof.  If you look into the store and check what is consuming the space it
> would be much appreciated -- my money is on the versions under 'pgmap/' or
> 'osdmap/'.
>
> Thanks!
>   -Joao
>
>
>>
>> 2013-03-02 03:42:54.993851 7ff9e9c97700 -1 mon/MonitorStore.cc: In
>> function 'void MonitorStore::write_bl_ss(ceph::bufferlist&, const
>> char*, const char*, bool)' thread 7ff9e9c97700 time 2013-03-02
>> 03:42:54.984980
>> mon/MonitorStore.cc: 382: FAILED assert(!err)
>>
>>   ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
>>   1: (MonitorStore::write_bl_ss(ceph::buffer::list&, char const*, char
>> const*, bool)+0xe18) [0x528398]
>>   2: (OSDMonitor::update_from_paxos()+0xabb) [0x4b733b]
>>   3: (PaxosService::_active()+0x202) [0x4a2492]
>>   4: (Context::complete(int)+0xa) [0x485a8a]
>>   5: (finish_contexts(CephContext*, std::list<Context*,
>> std::allocator<Context*> >&, int)+0x11d) [0x4878cd]
>>   6: (Paxos::handle_lease(MMonPaxos*)+0x54f) [0x496a5f]
>>   7: (Paxos::dispatch(PaxosServiceMessage*)+0x28b) [0x49f3ab]
>>   8: (Monitor::_ms_dispatch(Message*)+0xfdf) [0x484b6f]
>>   9: (Monitor::ms_dispatch(Message*)+0x32) [0x4945c2]
>>   10: (DispatchQueue::entry()+0x349) [0x63d009]
>>   11: (DispatchQueue::DispatchThread::entry()+0xd) [0x5d67bd]
>>   12: (()+0x7e9a) [0x7ff9ef082e9a]
>>   13: (clone()+0x6d) [0x7ff9edf23cbd]
>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> Interestingly an OSD went down about the same time (again, I suspect
>> oom, but can't find it).  Here's the log entry:
>>
>> 2013-03-02 03:42:14.025938 7f32cf88c700 -1 *** Caught signal (Aborted) **
>>   in thread 7f32cf88c700
>>
>>   ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
>>   1: /usr/bin/ceph-osd() [0x78430a]
>>   2: (()+0xfcb0) [0x7f32e1eb2cb0]
>>   3: (gsignal()+0x35) [0x7f32e0871425]
>>   4: (abort()+0x17b) [0x7f32e0874b8b]
>>   5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f32e11c369d]
>>   6: (()+0xb5846) [0x7f32e11c1846]
>>   7: (()+0xb5873) [0x7f32e11c1873]
>>   8: (()+0xb596e) [0x7f32e11c196e]
>>   9: (operator new[](unsigned long)+0x47e) [0x7f32e1656b1e]
>>   10: (ceph::buffer::create(unsigned int)+0x67) [0x82ec47]
>>   11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x82efb5]
>>   12: (FileStore::read(coll_t, hobject_t const&, unsigned long,
>> unsigned long, ceph::buffer::list&)+0x1ae) [0x6f814e]
>>   13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t,
>> bool)+0x347) [0x6970e7]
>>   14: (PG::chunky_scrub()+0x375) [0x69bee5]
>>   15: (PG::scrub()+0x145) [0x69d265]
>>   16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x63437c]
>>   17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x823cf6]
>>   18: (ThreadPool::WorkThread::entry()+0x10) [0x825b20]
>>   19: (()+0x7e9a) [0x7f32e1eaae9a]
>>   20: (clone()+0x6d) [0x7f32e092ecbd]
>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>>   - Travis
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux