Re: question on mon memory usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/04/2013 07:12 PM, Travis Rhoden wrote:
Joao,

Were you able to glean anything useful from the memory dump I provided?

Hey Travis,

Haven't had the chance to look into the dump, but it's still on my stack to go over as soon as I'm able to get into it.


The mon did eventually crash, presumable from out of memory (althought
I'm failing to find oom_killer log entries.  still searching).  Here's
what it looked like:

This looks like a result from an ENOSPC. You should check if you have any available space on the disk where the monitor resides. It might very well be a side-effect of whatever has made your memory consumption go through the roof. If you look into the store and check what is consuming the space it would be much appreciated -- my money is on the versions under 'pgmap/' or 'osdmap/'.

Thanks!
  -Joao


2013-03-02 03:42:54.993851 7ff9e9c97700 -1 mon/MonitorStore.cc: In
function 'void MonitorStore::write_bl_ss(ceph::bufferlist&, const
char*, const char*, bool)' thread 7ff9e9c97700 time 2013-03-02
03:42:54.984980
mon/MonitorStore.cc: 382: FAILED assert(!err)

  ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
  1: (MonitorStore::write_bl_ss(ceph::buffer::list&, char const*, char
const*, bool)+0xe18) [0x528398]
  2: (OSDMonitor::update_from_paxos()+0xabb) [0x4b733b]
  3: (PaxosService::_active()+0x202) [0x4a2492]
  4: (Context::complete(int)+0xa) [0x485a8a]
  5: (finish_contexts(CephContext*, std::list<Context*,
std::allocator<Context*> >&, int)+0x11d) [0x4878cd]
  6: (Paxos::handle_lease(MMonPaxos*)+0x54f) [0x496a5f]
  7: (Paxos::dispatch(PaxosServiceMessage*)+0x28b) [0x49f3ab]
  8: (Monitor::_ms_dispatch(Message*)+0xfdf) [0x484b6f]
  9: (Monitor::ms_dispatch(Message*)+0x32) [0x4945c2]
  10: (DispatchQueue::entry()+0x349) [0x63d009]
  11: (DispatchQueue::DispatchThread::entry()+0xd) [0x5d67bd]
  12: (()+0x7e9a) [0x7ff9ef082e9a]
  13: (clone()+0x6d) [0x7ff9edf23cbd]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

Interestingly an OSD went down about the same time (again, I suspect
oom, but can't find it).  Here's the log entry:

2013-03-02 03:42:14.025938 7f32cf88c700 -1 *** Caught signal (Aborted) **
  in thread 7f32cf88c700

  ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
  1: /usr/bin/ceph-osd() [0x78430a]
  2: (()+0xfcb0) [0x7f32e1eb2cb0]
  3: (gsignal()+0x35) [0x7f32e0871425]
  4: (abort()+0x17b) [0x7f32e0874b8b]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f32e11c369d]
  6: (()+0xb5846) [0x7f32e11c1846]
  7: (()+0xb5873) [0x7f32e11c1873]
  8: (()+0xb596e) [0x7f32e11c196e]
  9: (operator new[](unsigned long)+0x47e) [0x7f32e1656b1e]
  10: (ceph::buffer::create(unsigned int)+0x67) [0x82ec47]
  11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x82efb5]
  12: (FileStore::read(coll_t, hobject_t const&, unsigned long,
unsigned long, ceph::buffer::list&)+0x1ae) [0x6f814e]
  13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t,
bool)+0x347) [0x6970e7]
  14: (PG::chunky_scrub()+0x375) [0x69bee5]
  15: (PG::scrub()+0x145) [0x69d265]
  16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x63437c]
  17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x823cf6]
  18: (ThreadPool::WorkThread::entry()+0x10) [0x825b20]
  19: (()+0x7e9a) [0x7f32e1eaae9a]
  20: (clone()+0x6d) [0x7f32e092ecbd]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

  - Travis

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux