Hi, I have an issue with my (small) ceph cluster after an osd failed. ceph -s reports the following: cluster 2752438a-a33e-4df4-b9ec-beae32d00aad health HEALTH_WARN 31 pgs down 31 pgs peering 31 pgs stuck inactive 31 pgs stuck unclean monmap e1: 1 mons at {0=192.168.19.13:6789/0} election epoch 1, quorum 0 0 osdmap e138: 3 osds: 2 up, 2 in pgmap v77979: 64 pgs, 1 pools, 844 GB data, 211 kobjects 1290 GB used, 8021 GB / 9315 GB avail 33 active+clean 31 down+peering I am now unable to map the rbd image; the command will just time out. The log is at the end of the message. Is there a way to recover the osd / the ceph cluster from this? thanks in advance Philipp -2> 2015-10-30 01:04:59.689116 7f4bb741e700 1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had timed out after 15 -1> 2015-10-30 01:04:59.689140 7f4bb741e700 1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had suicide timed out after 150 0> 2015-10-30 01:04:59.906546 7f4bb741e700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f4bb741e700 time 2015-10-30 01:04:59.689176 common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x77) [0xb12457] 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x119) [0xa47179] 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] 4: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] 5: (CephContextServiceThread::entry()+0x164) [0xb21974] 6: (()+0x76f5) [0x7f4bbdb0c6f5] 7: (__clone()+0x6d) [0x7f4bbc09cedd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.2.log --- end dump of recent events --- 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) ** in thread 7f4bb741e700 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) 1: /usr/bin/ceph-osd() [0xa11c84] 2: (()+0x10690) [0x7f4bbdb15690] 3: (gsignal()+0x37) [0x7f4bbbfe63c7] 4: (abort()+0x16a) [0x7f4bbbfe77fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] 6: (()+0x5dda7) [0x7f4bbc8c5da7] 7: (()+0x5ddf2) [0x7f4bbc8c5df2] 8: (()+0x5e008) [0x7f4bbc8c6008] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x252) [0xb12632] 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x119) [0xa47179] 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] 13: (CephContextServiceThread::entry()+0x164) [0xb21974] 14: (()+0x76f5) [0x7f4bbdb0c6f5] 15: (__clone()+0x6d) [0x7f4bbc09cedd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) ** in thread 7f4bb741e700 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) 1: /usr/bin/ceph-osd() [0xa11c84] 2: (()+0x10690) [0x7f4bbdb15690] 3: (gsignal()+0x37) [0x7f4bbbfe63c7] 4: (abort()+0x16a) [0x7f4bbbfe77fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] 6: (()+0x5dda7) [0x7f4bbc8c5da7] 7: (()+0x5ddf2) [0x7f4bbc8c5df2] 8: (()+0x5e008) [0x7f4bbc8c6008] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x252) [0xb12632] 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x119) [0xa4 7179] 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] 13: (CephContextServiceThread::entry()+0x164) [0xb21974] 14: (()+0x76f5) [0x7f4bbdb0c6f5] 15: (__clone()+0x6d) [0x7f4bbc09cedd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this . --- begin dump of recent events --- 0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) ** in thread 7f4bb741e700 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) 1: /usr/bin/ceph-osd() [0xa11c84] 2: (()+0x10690) [0x7f4bbdb15690] 3: (gsignal()+0x37) [0x7f4bbbfe63c7] 4: (abort()+0x16a) [0x7f4bbbfe77fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] 6: (()+0x5dda7) [0x7f4bbc8c5da7] 7: (()+0x5ddf2) [0x7f4bbc8c5df2] 8: (()+0x5e008) [0x7f4bbc8c6008] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x252) [0xb12632] 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x119) [0xa4 7179] 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] 13: (CephContextServiceThread::entry()+0x164) [0xb21974] 14: (()+0x76f5) [0x7f4bbdb0c6f5] 15: (__clone()+0x6d) [0x7f4bbc09cedd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this . --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.2.log --- end dump of recent events --- 2015-10-30 01:07:00.920675 7f0ed0d067c0 0 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb479 6fd864d9afe2b), process ceph-osd, pid 14210 2015-10-30 01:07:01.096259 7f0ed0d067c0 0 filestore(/var/lib/ceph/osd/ceph-2) backend btrf s (magic 0x9123683e) 2015-10-30 01:07:01.099472 7f0ed0d067c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2 ) detect_features: FIEMAP ioctl is supported and appears to work 2015-10-30 01:07:01.099511 7f0ed0d067c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2 ) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-10-30 01:07:02.681342 7f0ed0d067c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2 ) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-10-30 01:07:02.682285 7f0ed0d067c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: CLONE_RANGE ioctl is supported 2015-10-30 01:07:04.508905 7f0ed0d067c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.2.log --- end dump of recent events --- 2015-10-30 01:07:00.920675 7f0ed0d067c0 0 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb479 6fd864d9afe2b), process ceph-osd, pid 14210 2015-10-30 01:07:01.096259 7f0ed0d067c0 0 filestore(/var/lib/ceph/osd/ceph-2) backend btrf s (magic 0x9123683e) 2015-10-30 01:07:01.099472 7f0ed0d067c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2 ) detect_features: FIEMAP ioctl is supported and appears to work 2015-10-30 01:07:01.099511 7f0ed0d067c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2 ) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-10-30 01:07:02.681342 7f0ed0d067c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2 ) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-10-30 01:07:02.682285 7f0ed0d067c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: CLONE_RANGE ioctl is supported 2015-10-30 01:07:04.508905 7f0ed0d067c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: SNAP_CREATE is supported 2015-10-30 01:07:04.509418 7f0ed0d067c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: SNAP_DESTROY is supported 2015-10-30 01:07:04.518728 7f0ed0d067c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: START_SYNC is supported (transid 8343) 2015-10-30 01:07:05.524109 7f0ed0d067c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: WAIT_SYNC is supported 2015-10-30 01:07:05.705014 7f0ed0d067c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: SNAP_CREATE_V2 is supported 2015-10-30 01:07:06.051275 7f0ed0d067c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) rollback_to: error removing old current subvol: (1) Operation not permitted 2015-10-30 01:07:07.655679 7f0ed0d067c0 -1 filestore(/var/lib/ceph/osd/ceph-2) mount initial op seq is 0; something is wrong 2015-10-30 01:07:07.655801 7f0ed0d067c0 -1 osd.2 0 OSD:init: unable to mount object store 2015-10-30 01:07:07.655821 7f0ed0d067c0 -1 ESC[0;31m ** ERROR: osd init failed: (22) Invalid argumentESC[0m _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com