osd failing to start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I have a ceph cluster where the one osd is failng to start. I have been upgrading ceph to see if the error dissappered. Now I'm running jewel but I still get the  error message.


   -31> 2016-07-13 17:03:30.474321 7fda18a8b700  2 -- 10.0.6.21:6800/1876 >> 10.0.5.71:6789/0 pipe(0x7fdb5712a800 sd=111 :36196 s=2 pgs=486 cs=1 l=1 c=0x7fdaaf060400).reader got KEEPALIVE_ACK
   -30> 2016-07-13 17:03:32.054328 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
   -29> 2016-07-13 17:03:32.054353 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
   -28> 2016-07-13 17:03:37.054430 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
   -27> 2016-07-13 17:03:37.054456 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
   -26> 2016-07-13 17:03:42.054535 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
   -25> 2016-07-13 17:03:42.054553 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
   -24> 2016-07-13 17:03:47.054633 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
   -23> 2016-07-13 17:03:47.054658 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
   -22> 2016-07-13 17:03:52.054735 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
   -21> 2016-07-13 17:03:52.054752 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
   -20> 2016-07-13 17:03:57.054829 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
   -19> 2016-07-13 17:03:57.054847 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
   -18> 2016-07-13 17:04:00.473446 7fda275db700 10 monclient(hunting): tick
   -17> 2016-07-13 17:04:00.473485 7fda275db700  1 monclient(hunting): continuing hunt
   -16> 2016-07-13 17:04:00.473488 7fda275db700 10 monclient(hunting): _reopen_session rank -1 name 
   -15> 2016-07-13 17:04:00.473498 7fda275db700  1 -- 10.0.6.21:6800/1876 mark_down 0x7fdaaf060400 -- 0x7fdb5712a800
   -14> 2016-07-13 17:04:00.473678 7fda275db700 10 monclient(hunting): picked mon.c con 0x7fdaaf060580 addr 10.0.5.73:6789/0
   -13> 2016-07-13 17:04:00.473698 7fda275db700 10 monclient(hunting): _send_mon_message to mon.c at 10.0.5.73:6789/0
   -12> 2016-07-13 17:04:00.473705 7fda275db700  1 -- 10.0.6.21:6800/1876 --> 10.0.5.73:6789/0 -- auth(proto 0 27 bytes epoch 17) v1 -- ?+0 0x7fdad9490000 con 0x7fdaaf060580
   -11> 2016-07-13 17:04:00.473720 7fda275db700 10 monclient(hunting): renew_subs
   -10> 2016-07-13 17:04:02.054922 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
    -9> 2016-07-13 17:04:02.054938 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
    -8> 2016-07-13 17:04:07.055017 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
    -7> 2016-07-13 17:04:07.055035 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
    -6> 2016-07-13 17:04:12.055114 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
    -5> 2016-07-13 17:04:12.055144 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
    -4> 2016-07-13 17:04:17.055223 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
    -3> 2016-07-13 17:04:17.055243 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda265d9700' had timed out after 15
    -2> 2016-07-13 17:04:22.055321 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had timed out after 15
    -1> 2016-07-13 17:04:22.061384 7fda4d24e700  1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7fda25dd8700' had suicide timed out after 150
     0> 2016-07-13 17:04:24.244698 7fda4d24e700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fda4d24e700 time 2016-07-13 17:04:22.078324
common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x7fda53cbd5d2]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, long)+0x11f) [0x7fda53bf30bf]
 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7fda53bf3ae6]
 4: (ceph::HeartbeatMap::check_touch_file()+0x2a) [0x7fda53bf42fa]
 5: (CephContextServiceThread::entry()+0x16c) [0x7fda53cd767c]
 6: (()+0x80a4) [0x7fda520ae0a4]
 7: (clone()+0x6d) [0x7fda501b287d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.11.log
--- end dump of recent events ---
2016-07-13 17:04:26.335039 7fda4d24e700 -1 *** Caught signal (Aborted) **
 in thread 7fda4d24e700 thread_name:service

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x94c167) [0x7fda53bb5167]
 2: (()+0xf8d0) [0x7fda520b58d0]
 3: (gsignal()+0x37) [0x7fda500ff067]
 4: (abort()+0x148) [0x7fda50100448]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7fda53cbd7a6]
 6: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, long)+0x11f) [0x7fda53bf30bf]
 7: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7fda53bf3ae6]
 8: (ceph::HeartbeatMap::check_touch_file()+0x2a) [0x7fda53bf42fa]
 9: (CephContextServiceThread::entry()+0x16c) [0x7fda53cd767c]
 10: (()+0x80a4) [0x7fda520ae0a4]
 11: (clone()+0x6d) [0x7fda501b287d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2016-07-13 17:04:26.335039 7fda4d24e700 -1 *** Caught signal (Aborted) **
 in thread 7fda4d24e700 thread_name:service

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x94c167) [0x7fda53bb5167]
 2: (()+0xf8d0) [0x7fda520b58d0]
 3: (gsignal()+0x37) [0x7fda500ff067]
 4: (abort()+0x148) [0x7fda50100448]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7fda53cbd7a6]
 6: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, long)+0x11f) [0x7fda53bf30bf]
 7: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7fda53bf3ae6]
 8: (ceph::HeartbeatMap::check_touch_file()+0x2a) [0x7fda53bf42fa]
 9: (CephContextServiceThread::entry()+0x16c) [0x7fda53cd767c]
 10: (()+0x80a4) [0x7fda520ae0a4]
 11: (clone()+0x6d) [0x7fda501b287d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.11.log
--- end dump of recent events ---

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux