Re: Hit suicide timeout on osd start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



To be more precise, not only debug_ms was changed from 0 to 1, debug_osd was also changed from 0 to 50. 
Seems like a race condition somewhere under the hood, that had been concealed by logging delays and was revealed only after excluding those delays (by setting debug ms/osd to 0).


2013/9/12 Andrey Korolyov <andrey@xxxxxxx>
A little follow-up:

One of cluster nodes(from not-yet-restarted set) went in some kind of
flapping state exposing cpu consumption peaks and latency spikes every
50 seconds. Even more interesting thing was that when we injected
non-zero debug_ms latency spikes had gone away, but cpu ones remains
as well. At the picture[0] below we had injected debug_ms 1 and log
file as /dev/null at the 19:03 and set it back to 0 at 19:13.

0. http://i.imgur.com/8BBWM7o.png


On Wed, Sep 11, 2013 at 5:05 AM, Andrey Korolyov <andrey@xxxxxxx> wrote:
> Hello,
>
> Got so-famous error on 0.61.8, just for little disk overload on OSD
> daemon start. I currently have very large metadata per osd (about
> 20G), this may be an issue.
>
> #0  0x00007f2f46adeb7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000860469 in reraise_fatal (signum=6) at
> global/signal_handler.cc:58
> #2  handle_fatal_signal (signum=6) at global/signal_handler.cc:104
> #3  <signal handler called>
> #4  0x00007f2f44b45405 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #5  0x00007f2f44b48b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x00007f2f4544389d in __gnu_cxx::__verbose_terminate_handler() ()
> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #7  0x00007f2f45441996 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #8  0x00007f2f454419c3 in std::terminate() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #9  0x00007f2f45441bee in __cxa_throw () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #10 0x000000000090d2fa in ceph::__ceph_assert_fail (assertion=0xa38ab1
> "0 == \"hit suicide timeout\"", file=<optimized out>, line=79,
>     func=0xa38c60 "bool
> ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
> time_t)") at common/assert.cc:77
> #11 0x000000000087914b in ceph::HeartbeatMap::_check
> (this=this@entry=0x26560e0, h=<optimized out>, who=who@entry=0xa38b40
> "is_healthy",
>     now=now@entry=1378860192) at common/HeartbeatMap.cc:79
> #12 0x0000000000879956 in ceph::HeartbeatMap::is_healthy
> (this=this@entry=0x26560e0) at common/HeartbeatMap.cc:130
> #13 0x0000000000879f08 in ceph::HeartbeatMap::check_touch_file
> (this=0x26560e0) at common/HeartbeatMap.cc:141
> #14 0x00000000009189f5 in CephContextServiceThread::entry
> (this=0x2652200) at common/ceph_context.cc:68
> #15 0x00007f2f46ad6e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #16 0x00007f2f44c013dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #17 0x0000000000000000 in ?? ()

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux