A little follow-up: One of cluster nodes(from not-yet-restarted set) went in some kind of flapping state exposing cpu consumption peaks and latency spikes every 50 seconds. Even more interesting thing was that when we injected non-zero debug_ms latency spikes had gone away, but cpu ones remains as well. At the picture[0] below we had injected debug_ms 1 and log file as /dev/null at the 19:03 and set it back to 0 at 19:13. 0. http://i.imgur.com/8BBWM7o.png On Wed, Sep 11, 2013 at 5:05 AM, Andrey Korolyov <andrey@xxxxxxx> wrote: > Hello, > > Got so-famous error on 0.61.8, just for little disk overload on OSD > daemon start. I currently have very large metadata per osd (about > 20G), this may be an issue. > > #0 0x00007f2f46adeb7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x0000000000860469 in reraise_fatal (signum=6) at > global/signal_handler.cc:58 > #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:104 > #3 <signal handler called> > #4 0x00007f2f44b45405 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > #5 0x00007f2f44b48b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6 > #6 0x00007f2f4544389d in __gnu_cxx::__verbose_terminate_handler() () > from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #7 0x00007f2f45441996 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #8 0x00007f2f454419c3 in std::terminate() () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #9 0x00007f2f45441bee in __cxa_throw () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #10 0x000000000090d2fa in ceph::__ceph_assert_fail (assertion=0xa38ab1 > "0 == \"hit suicide timeout\"", file=<optimized out>, line=79, > func=0xa38c60 "bool > ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, > time_t)") at common/assert.cc:77 > #11 0x000000000087914b in ceph::HeartbeatMap::_check > (this=this@entry=0x26560e0, h=<optimized out>, who=who@entry=0xa38b40 > "is_healthy", > now=now@entry=1378860192) at common/HeartbeatMap.cc:79 > #12 0x0000000000879956 in ceph::HeartbeatMap::is_healthy > (this=this@entry=0x26560e0) at common/HeartbeatMap.cc:130 > #13 0x0000000000879f08 in ceph::HeartbeatMap::check_touch_file > (this=0x26560e0) at common/HeartbeatMap.cc:141 > #14 0x00000000009189f5 in CephContextServiceThread::entry > (this=0x2652200) at common/ceph_context.cc:68 > #15 0x00007f2f46ad6e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #16 0x00007f2f44c013dd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #17 0x0000000000000000 in ?? () _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com