osd fails to start, rbd hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have an issue with my (small) ceph cluster after an osd failed.
ceph -s reports the following:
    cluster 2752438a-a33e-4df4-b9ec-beae32d00aad
     health HEALTH_WARN
            31 pgs down
            31 pgs peering
            31 pgs stuck inactive
            31 pgs stuck unclean
     monmap e1: 1 mons at {0=192.168.19.13:6789/0}
            election epoch 1, quorum 0 0
     osdmap e138: 3 osds: 2 up, 2 in
      pgmap v77979: 64 pgs, 1 pools, 844 GB data, 211 kobjects
            1290 GB used, 8021 GB / 9315 GB avail
                  33 active+clean
                  31 down+peering

I am now unable to map the rbd image; the command will just time out.
The log is at the end of the message.

Is there a way to recover the osd / the ceph cluster from this?

thanks in advance
	Philipp



    -2> 2015-10-30 01:04:59.689116 7f4bb741e700  1 heartbeat_map
is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had timed out after 15
    -1> 2015-10-30 01:04:59.689140 7f4bb741e700  1 heartbeat_map
is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had suicide timed out
after 150
     0> 2015-10-30 01:04:59.906546 7f4bb741e700 -1
common/HeartbeatMap.cc: In function 'bool
ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
time_t)' thread 7f4bb741e700 time 2015-10-30 01:04:59.689176
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")

 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x77) [0xb12457]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
long)+0x119) [0xa47179]
 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
 4: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
 5: (CephContextServiceThread::entry()+0x164) [0xb21974]
 6: (()+0x76f5) [0x7f4bbdb0c6f5]
 7: (__clone()+0x6d) [0x7f4bbc09cedd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.2.log
--- end dump of recent events ---
2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) **
 in thread 7f4bb741e700

 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
 1: /usr/bin/ceph-osd() [0xa11c84]
 2: (()+0x10690) [0x7f4bbdb15690]
 3: (gsignal()+0x37) [0x7f4bbbfe63c7]
 4: (abort()+0x16a) [0x7f4bbbfe77fa]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
 6: (()+0x5dda7) [0x7f4bbc8c5da7]
 7: (()+0x5ddf2) [0x7f4bbc8c5df2]
 8: (()+0x5e008) [0x7f4bbc8c6008]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x252) [0xb12632]
 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
long)+0x119) [0xa47179]
 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
 13: (CephContextServiceThread::entry()+0x164) [0xb21974]
 14: (()+0x76f5) [0x7f4bbdb0c6f5]
 15: (__clone()+0x6d) [0x7f4bbc09cedd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
     0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal
(Aborted) **
 in thread 7f4bb741e700

 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
 1: /usr/bin/ceph-osd() [0xa11c84]
 2: (()+0x10690) [0x7f4bbdb15690]
 3: (gsignal()+0x37) [0x7f4bbbfe63c7]
 4: (abort()+0x16a) [0x7f4bbbfe77fa]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
 6: (()+0x5dda7) [0x7f4bbc8c5da7]
 7: (()+0x5ddf2) [0x7f4bbc8c5df2]
 8: (()+0x5e008) [0x7f4bbc8c6008]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x252) [0xb12632]
 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
long)+0x119) [0xa4
7179]
 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
 13: (CephContextServiceThread::entry()+0x164) [0xb21974]
 14: (()+0x76f5) [0x7f4bbdb0c6f5]
 15: (__clone()+0x6d) [0x7f4bbc09cedd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this
.

--- begin dump of recent events ---
     0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal
(Aborted) **
 in thread 7f4bb741e700

 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
 1: /usr/bin/ceph-osd() [0xa11c84]
 2: (()+0x10690) [0x7f4bbdb15690]
 3: (gsignal()+0x37) [0x7f4bbbfe63c7]
 4: (abort()+0x16a) [0x7f4bbbfe77fa]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
 6: (()+0x5dda7) [0x7f4bbc8c5da7]
 7: (()+0x5ddf2) [0x7f4bbc8c5df2]
 8: (()+0x5e008) [0x7f4bbc8c6008]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x252) [0xb12632]
 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
long)+0x119) [0xa4
7179]
 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
 13: (CephContextServiceThread::entry()+0x164) [0xb21974]
 14: (()+0x76f5) [0x7f4bbdb0c6f5]
 15: (__clone()+0x6d) [0x7f4bbc09cedd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this
.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.2.log
--- end dump of recent events ---
2015-10-30 01:07:00.920675 7f0ed0d067c0  0 ceph version 0.94.3
(95cefea9fd9ab740263bf8bb479
6fd864d9afe2b), process ceph-osd, pid 14210
2015-10-30 01:07:01.096259 7f0ed0d067c0  0
filestore(/var/lib/ceph/osd/ceph-2) backend btrf
s (magic 0x9123683e)
2015-10-30 01:07:01.099472 7f0ed0d067c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2
) detect_features: FIEMAP ioctl is supported and appears to work
2015-10-30 01:07:01.099511 7f0ed0d067c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2
) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap'
config option
2015-10-30 01:07:02.681342 7f0ed0d067c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2
) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-10-30 01:07:02.682285 7f0ed0d067c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)
detect_feature: CLONE_RANGE ioctl is supported
2015-10-30 01:07:04.508905 7f0ed0d067c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)    1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.2.log
--- end dump of recent events ---
2015-10-30 01:07:00.920675 7f0ed0d067c0  0 ceph version 0.94.3
(95cefea9fd9ab740263bf8bb479
6fd864d9afe2b), process ceph-osd, pid 14210
2015-10-30 01:07:01.096259 7f0ed0d067c0  0
filestore(/var/lib/ceph/osd/ceph-2) backend btrf
s (magic 0x9123683e)
2015-10-30 01:07:01.099472 7f0ed0d067c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2
) detect_features: FIEMAP ioctl is supported and appears to work
2015-10-30 01:07:01.099511 7f0ed0d067c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2
) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap'
config option
2015-10-30 01:07:02.681342 7f0ed0d067c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2
) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-10-30 01:07:02.682285 7f0ed0d067c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)
detect_feature: CLONE_RANGE ioctl is supported
2015-10-30 01:07:04.508905 7f0ed0d067c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)
detect_feature: SNAP_CREATE is supported
2015-10-30 01:07:04.509418 7f0ed0d067c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)
detect_feature: SNAP_DESTROY is supported
2015-10-30 01:07:04.518728 7f0ed0d067c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
START_SYNC is supported (transid 8343)
2015-10-30 01:07:05.524109 7f0ed0d067c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
WAIT_SYNC is supported
2015-10-30 01:07:05.705014 7f0ed0d067c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
SNAP_CREATE_V2 is supported
2015-10-30 01:07:06.051275 7f0ed0d067c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) rollback_to: error
removing old current subvol: (1) Operation not permitted
2015-10-30 01:07:07.655679 7f0ed0d067c0 -1
filestore(/var/lib/ceph/osd/ceph-2) mount initial op seq is 0; something
is wrong
2015-10-30 01:07:07.655801 7f0ed0d067c0 -1 osd.2 0 OSD:init: unable to
mount object store
2015-10-30 01:07:07.655821 7f0ed0d067c0 -1 ESC[0;31m ** ERROR: osd init
failed: (22) Invalid argumentESC[0m

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux