Hi, I just noticed a strange behavior on one OSD (and only one, other OSDs on the same server didn’t show that behavior) in a ceph-cluster (all 0.94.2 on Debian 7 with a self-made 4.1 Kernel). The OSD started to accumulate slow requests, a restart didn’t help. After a few seconds the log is filled with lines like these: -91> 2015-07-20 21:55:03.537385 7f9e20ec3700 0 -- [<OwnIPv6>]:6814/1376041 >> [<OwnIPv6>]:0/2078381 pipe(0x5396f000 sd=16371 :6814 s=0 pgs=0 cs=0 l=1 c=0x538e7340).accept replacing existing (lossy) channel (new one lossy=1) (Full example after startup https://paste.ee/p/HfTlp ) With nearly 100% CPU usage. After some time the slow requests accumulate so I restart the OSD, if I wait longer I observed a termination at the end (longer version: https://paste.ee/p/XvD0o ): -6> 2015-07-20 21:55:03.729709 7f9e1681d700 0 -- [<OwnIPv6>]:6814/1376041 >> [<OwnIPv6>]:0/2078381 pipe(0x53d5a000 sd=16454 :6814 s=0 pgs=0 cs=0 l=1 c=0x53cf7600).accept replacing existing (lossy) channel (new one lossy=1) -5> 2015-07-20 21:55:03.737393 7fa637a5c700 -1 osd.9 31469 heartbeat_check: no reply from osd.32 since back 2015-07-20 21:53:08.918692 front 2015-07-20 21:53:56.149747 (cutoff 2015-07-20 21:54:43.737387) -4> 2015-07-20 21:55:03.737433 7fa637a5c700 -1 osd.9 31469 heartbeat_check: no reply from osd.33 since back 2015-07-20 21:54:34.759924 front 2015-07-20 21:53:46.235158 (cutoff 2015-07-20 21:54:43.737387) -3> 2015-07-20 21:55:03.737443 7fa637a5c700 -1 osd.9 31469 heartbeat_check: no reply from osd.35 since back 2015-07-20 21:54:20.657821 front 2015-07-20 21:54:20.657821 (cutoff 2015-07-20 21:54:43.737387) -2> 2015-07-20 21:55:03.737689 7fa637a5c700 0 log_channel(cluster) log [WRN] : 80 slow requests, 1 included below; oldest blocked for > 79.872208 secs -1> 2015-07-20 21:55:03.737700 7fa637a5c700 0 log_channel(cluster) log [WRN] : slow request 36.802253 seconds old, received at 2015-07-20 21:54:26.935372: osd_op(client.1024363.0:6627934 rbd_header.79e8074b0dc51 [watch reconnect cookie 94636449644720 gen 97842] 1.e7fceb98 ondisk+write+known_if_redirected e31467) currently no flag points reached 0> 2015-07-20 21:55:03.744057 7fa628898700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fa628898700 time 2015-07-20 21:55:03.730601 common/Thread.cc: 129: FAILED assert(ret == 0) ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x72) [0xcdb572] 2: /usr/bin/ceph-osd() [0xcc236f] 3: (SimpleMessenger::add_accept_pipe(int)+0x6f) [0xcb903f] 4: (Accepter::entry()+0x342) [0xd71b22] 5: (()+0x6b50) [0x7fa63fa88b50] 6: (clone()+0x6d) [0x7fa63e4a495d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 5 rbd_replay 0/ 0 journaler 0/ 5 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 1/ 3 keyvaluestore 0/ 0 journal 0/ 0 ms 0/ 0 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 1/10 civetweb 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.9.log --- end dump of recent events --- 2015-07-20 21:55:03.940097 7fa628898700 -1 *** Caught signal (Aborted) ** in thread 7fa628898700 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) 1: /usr/bin/ceph-osd() [0xbef08c] 2: (()+0xf0a0) [0x7fa63fa910a0] 3: (gsignal()+0x35) [0x7fa63e3fb165] 4: (abort()+0x180) [0x7fa63e3fe3e0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fa63ec5189d] 6: (()+0x63996) [0x7fa63ec4f996] 7: (()+0x639c3) [0x7fa63ec4f9c3] 8: (()+0x63bee) [0x7fa63ec4fbee] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x220) [0xcdb720] 10: /usr/bin/ceph-osd() [0xcc236f] 11: (SimpleMessenger::add_accept_pipe(int)+0x6f) [0xcb903f] 12: (Accepter::entry()+0x342) [0xd71b22] 13: (()+0x6b50) [0x7fa63fa88b50] 14: (clone()+0x6d) [0x7fa63e4a495d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2015-07-20 21:55:03.940097 7fa628898700 -1 *** Caught signal (Aborted) ** in thread 7fa628898700 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) 1: /usr/bin/ceph-osd() [0xbef08c] 2: (()+0xf0a0) [0x7fa63fa910a0] 3: (gsignal()+0x35) [0x7fa63e3fb165] 4: (abort()+0x180) [0x7fa63e3fe3e0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fa63ec5189d] 6: (()+0x63996) [0x7fa63ec4f996] 7: (()+0x639c3) [0x7fa63ec4f9c3] 8: (()+0x63bee) [0x7fa63ec4fbee] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x220) [0xcdb720] 10: /usr/bin/ceph-osd() [0xcc236f] 11: (SimpleMessenger::add_accept_pipe(int)+0x6f) [0xcb903f] 12: (Accepter::entry()+0x342) [0xd71b22] 13: (()+0x6b50) [0x7fa63fa88b50] 14: (clone()+0x6d) [0x7fa63e4a495d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 5 rbd_replay 0/ 0 journaler 0/ 5 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 1/ 3 keyvaluestore 0/ 0 journal 0/ 0 ms 0/ 0 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 1/10 civetweb 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.9.log --- end dump of recent events — Any ideas how to fix this? (or shall I stop the OSD, format drive and create a new OSD). greetings Johannes _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com