Dear all, I run a ceph system with 4 nodes and ~80 OSDs using xfs, with currently 75% usage, running firefly. On friday I upgraded it from 0.80.8 to 0.80.10, and since then I got several OSDs crashing and never recovering: trying to run it, ends up crashing as follows. Is this problem known? Is there any configuration that should be checked? Any way to try to recover these OSDs without losing all data? After that, setting the OSD to lost, I got one incomplete, inactive PG. Is there any way to recover it? Data still exists in crashed OSDs. Regards. [(12:58:13) root@spcsnp3 ~]# service ceph start osd.7 === osd.7 === 2015-08-11 12:58:21.003876 7f17ed52b700 1 monclient(hunting): found mon.spcsmp2 2015-08-11 12:58:21.003915 7f17ef493700 5 monclient: authenticate success, global_id 206010466 create-or-move updated item name 'osd.7' weight 3.64 at location {host=spcsnp3,root=default} to crush map Starting Ceph osd.7 on spcsnp3... 2015-08-11 12:58:21.279878 7f200fa8f780 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 31918 starting osd.7 at :/0 osd_data /var/lib/ceph/osd/ceph-7 /var/lib/ceph/osd/ceph-7/journal [(12:58:21) root@spcsnp3 ~]# 2015-08-11 12:58:21.348094 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) dump_stop 2015-08-11 12:58:21.348291 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) basedir /var/lib/ceph/osd/ceph-7 journal /var/lib/ceph/osd/ceph-7/journal 2015-08-11 12:58:21.348326 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) mount fsid is 54c136da-c51c-4799-b2dc-b7988982ee00 2015-08-11 12:58:21.349010 7f200fa8f780 0 filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs) 2015-08-11 12:58:21.349026 7f200fa8f780 1 filestore(/var/lib/ceph/osd/ceph-7) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-08-11 12:58:21.353277 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is supported and appears to work 2015-08-11 12:58:21.353302 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-08-11 12:58:21.362106 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: syscall(SYS_syncfs, fd) fully supported 2015-08-11 12:58:21.362195 7f200fa8f780 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is disabled by conf 2015-08-11 12:58:21.362701 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) mount op_seq is 35490995 2015-08-11 12:58:59.383179 7f200fa8f780 -1 *** Caught signal (Aborted) ** in thread 7f200fa8f780 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) 1: /usr/bin/ceph-osd() [0xab7562] 2: (()+0xf0a0) [0x7f200efcd0a0] 3: (gsignal()+0x35) [0x7f200db3f165] 4: (abort()+0x180) [0x7f200db423e0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f200e39589d] 6: (()+0x63996) [0x7f200e393996] 7: (()+0x639c3) [0x7f200e3939c3] 8: (()+0x63bee) [0x7f200e393bee] 9: (tc_new()+0x48e) [0x7f200f213aee] 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7f200e3ef999] 11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x28) [0x7f200e3f0708] 12: (std::string::reserve(unsigned long)+0x30) [0x7f200e3f07f0] 13: (std::string::append(char const*, unsigned long)+0xb5) [0x7f200e3f0ab5] 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7f200f46ffa2] 15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*, unsigned long*)+0x180) [0x7f200f468360] 16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2) [0x7f200f46adf2] 17: (leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)+0xff) [0x7f200f46b11f] 18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8] 19: (FileStore::mount()+0x18e0) [0x9b7080] 20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a] 21: (main()+0x2234) [0x7331c4] 22: (__libc_start_main()+0xfd) [0x7f200db2bead] 23: /usr/bin/ceph-osd() [0x736e99] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -66> 2015-08-11 12:58:21.277524 7f200fa8f780 5 asok(0x2800230) register_command perfcounters_dump hook 0x27f0010 -65> 2015-08-11 12:58:21.277552 7f200fa8f780 5 asok(0x2800230) register_command 1 hook 0x27f0010 -64> 2015-08-11 12:58:21.277556 7f200fa8f780 5 asok(0x2800230) register_command perf dump hook 0x27f0010 -63> 2015-08-11 12:58:21.277561 7f200fa8f780 5 asok(0x2800230) register_command perfcounters_schema hook 0x27f0010 -62> 2015-08-11 12:58:21.277564 7f200fa8f780 5 asok(0x2800230) register_command 2 hook 0x27f0010 -61> 2015-08-11 12:58:21.277566 7f200fa8f780 5 asok(0x2800230) register_command perf schema hook 0x27f0010 -60> 2015-08-11 12:58:21.277569 7f200fa8f780 5 asok(0x2800230) register_command config show hook 0x27f0010 -59> 2015-08-11 12:58:21.277573 7f200fa8f780 5 asok(0x2800230) register_command config set hook 0x27f0010 -58> 2015-08-11 12:58:21.277575 7f200fa8f780 5 asok(0x2800230) register_command config get hook 0x27f0010 -57> 2015-08-11 12:58:21.277578 7f200fa8f780 5 asok(0x2800230) register_command log flush hook 0x27f0010 -56> 2015-08-11 12:58:21.277581 7f200fa8f780 5 asok(0x2800230) register_command log dump hook 0x27f0010 -55> 2015-08-11 12:58:21.277583 7f200fa8f780 5 asok(0x2800230) register_command log reopen hook 0x27f0010 -54> 2015-08-11 12:58:21.279878 7f200fa8f780 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 31918 -53> 2015-08-11 12:58:21.345764 7f200fa8f780 1 -- 10.17.0.7:0/0 learned my addr 10.17.0.7:0/0 -52> 2015-08-11 12:58:21.345778 7f200fa8f780 1 accepter.accepter.bind my_inst.addr is 10.17.0.7:6813/31918 need_addr=0 -51> 2015-08-11 12:58:21.345792 7f200fa8f780 1 -- 10.18.0.7:0/0 learned my addr 10.18.0.7:0/0 -50> 2015-08-11 12:58:21.345795 7f200fa8f780 1 accepter.accepter.bind my_inst.addr is 10.18.0.7:6808/31918 need_addr=0 -49> 2015-08-11 12:58:21.345805 7f200fa8f780 1 -- 10.18.0.7:0/0 learned my addr 10.18.0.7:0/0 -48> 2015-08-11 12:58:21.345809 7f200fa8f780 1 accepter.accepter.bind my_inst.addr is 10.18.0.7:6809/31918 need_addr=0 -47> 2015-08-11 12:58:21.345827 7f200fa8f780 1 -- 10.17.0.7:0/0 learned my addr 10.17.0.7:0/0 -46> 2015-08-11 12:58:21.345830 7f200fa8f780 1 accepter.accepter.bind my_inst.addr is 10.17.0.7:6824/31918 need_addr=0 -45> 2015-08-11 12:58:21.345847 7f200fa8f780 1 -- 10.17.0.7:0/0 learned my addr 10.17.0.7:0/0 -44> 2015-08-11 12:58:21.345851 7f200fa8f780 1 accepter.accepter.bind my_inst.addr is 10.17.0.7:6825/31918 need_addr=0 -43> 2015-08-11 12:58:21.346156 7f200fa8f780 1 finished global_init_daemonize -42> 2015-08-11 12:58:21.348094 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) dump_stop -41> 2015-08-11 12:58:21.348119 7f200fa8f780 5 asok(0x2800230) init /var/run/ceph/ceph-osd.7.asok -40> 2015-08-11 12:58:21.348134 7f200fa8f780 5 asok(0x2800230) bind_and_listen /var/run/ceph/ceph-osd.7.asok -39> 2015-08-11 12:58:21.348232 7f200fa8f780 5 asok(0x2800230) register_command 0 hook 0x27ee0b0 -38> 2015-08-11 12:58:21.348242 7f200fa8f780 5 asok(0x2800230) register_command version hook 0x27ee0b0 -37> 2015-08-11 12:58:21.348246 7f200fa8f780 5 asok(0x2800230) register_command git_version hook 0x27ee0b0 -36> 2015-08-11 12:58:21.348250 7f200fa8f780 5 asok(0x2800230) register_command help hook 0x27f00b0 -35> 2015-08-11 12:58:21.348254 7f200fa8f780 5 asok(0x2800230) register_command get_command_descriptions hook 0x27f0150 -34> 2015-08-11 12:58:21.348278 7f200b749700 5 asok(0x2800230) entry start -33> 2015-08-11 12:58:21.348291 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) basedir /var/lib/ceph/osd/ceph-7 journal /var/lib/ceph/osd/ceph-7/journal -32> 2015-08-11 12:58:21.348326 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) mount fsid is 54c136da-c51c-4799-b2dc-b7988982ee00 -31> 2015-08-11 12:58:21.349010 7f200fa8f780 0 filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs) -30> 2015-08-11 12:58:21.349026 7f200fa8f780 1 filestore(/var/lib/ceph/osd/ceph-7) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs -29> 2015-08-11 12:58:21.353277 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is supported and appears to work -28> 2015-08-11 12:58:21.353302 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option -27> 2015-08-11 12:58:21.362106 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: syscall(SYS_syncfs, fd) fully supported -26> 2015-08-11 12:58:21.362195 7f200fa8f780 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is disabled by conf -25> 2015-08-11 12:58:21.362701 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) mount op_seq is 35490995 -24> 2015-08-11 12:58:24.458593 7f200b749700 5 asok(0x2800230) AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned 1164 bytes -23> 2015-08-11 12:58:24.462824 7f200b749700 1 do_command 'config get' 'format:json var:fsid -22> 2015-08-11 12:58:24.462850 7f200b749700 1 do_command 'config get' 'format:json var:fsid result is 47 bytes -21> 2015-08-11 12:58:24.462853 7f200b749700 5 asok(0x2800230) AdminSocket: request 'config get' '' to 0x27f0010 returned 47 bytes -20> 2015-08-11 12:58:24.463194 7f200b749700 5 asok(0x2800230) AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned 1164 bytes -19> 2015-08-11 12:58:24.467886 7f200b749700 5 asok(0x2800230) AdminSocket: request 'version' '' to 0x27ee0b0 returned 21 bytes -18> 2015-08-11 12:58:34.118231 7f200b749700 5 asok(0x2800230) AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned 1164 bytes -17> 2015-08-11 12:58:34.122484 7f200b749700 1 do_command 'config get' 'format:json var:fsid -16> 2015-08-11 12:58:34.122503 7f200b749700 1 do_command 'config get' 'format:json var:fsid result is 47 bytes -15> 2015-08-11 12:58:34.122506 7f200b749700 5 asok(0x2800230) AdminSocket: request 'config get' '' to 0x27f0010 returned 47 bytes -14> 2015-08-11 12:58:34.122739 7f200b749700 5 asok(0x2800230) AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned 1164 bytes -13> 2015-08-11 12:58:34.125503 7f200b749700 5 asok(0x2800230) AdminSocket: request 'version' '' to 0x27ee0b0 returned 21 bytes -12> 2015-08-11 12:58:44.136424 7f200b749700 5 asok(0x2800230) AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned 1164 bytes -11> 2015-08-11 12:58:44.140286 7f200b749700 1 do_command 'config get' 'format:json var:fsid -10> 2015-08-11 12:58:44.140304 7f200b749700 1 do_command 'config get' 'format:json var:fsid result is 47 bytes -9> 2015-08-11 12:58:44.140309 7f200b749700 5 asok(0x2800230) AdminSocket: request 'config get' '' to 0x27f0010 returned 47 bytes -8> 2015-08-11 12:58:44.140530 7f200b749700 5 asok(0x2800230) AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned 1164 bytes -7> 2015-08-11 12:58:44.143236 7f200b749700 5 asok(0x2800230) AdminSocket: request 'version' '' to 0x27ee0b0 returned 21 bytes -6> 2015-08-11 12:58:54.493800 7f200b749700 5 asok(0x2800230) AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned 1164 bytes -5> 2015-08-11 12:58:54.497564 7f200b749700 1 do_command 'config get' 'format:json var:fsid -4> 2015-08-11 12:58:54.497586 7f200b749700 1 do_command 'config get' 'format:json var:fsid result is 47 bytes -3> 2015-08-11 12:58:54.497591 7f200b749700 5 asok(0x2800230) AdminSocket: request 'config get' '' to 0x27f0010 returned 47 bytes -2> 2015-08-11 12:58:54.497905 7f200b749700 5 asok(0x2800230) AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned 1164 bytes -1> 2015-08-11 12:58:54.500762 7f200b749700 5 asok(0x2800230) AdminSocket: request 'version' '' to 0x27ee0b0 returned 21 bytes 0> 2015-08-11 12:58:59.383179 7f200fa8f780 -1 *** Caught signal (Aborted) ** in thread 7f200fa8f780 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) 1: /usr/bin/ceph-osd() [0xab7562] 2: (()+0xf0a0) [0x7f200efcd0a0] 3: (gsignal()+0x35) [0x7f200db3f165] 4: (abort()+0x180) [0x7f200db423e0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f200e39589d] 6: (()+0x63996) [0x7f200e393996] 7: (()+0x639c3) [0x7f200e3939c3] 8: (()+0x63bee) [0x7f200e393bee] 9: (tc_new()+0x48e) [0x7f200f213aee] 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7f200e3ef999] 11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x28) [0x7f200e3f0708] 12: (std::string::reserve(unsigned long)+0x30) [0x7f200e3f07f0] 13: (std::string::append(char const*, unsigned long)+0xb5) [0x7f200e3f0ab5] 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7f200f46ffa2] 15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*, unsigned long*)+0x180) [0x7f200f468360] 16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2) [0x7f200f46adf2] 17: (leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)+0xff) [0x7f200f46b11f] 18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8] 19: (FileStore::mount()+0x18e0) [0x9b7080] 20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a] 21: (main()+0x2234) [0x7331c4] 22: (__libc_start_main()+0xfd) [0x7f200db2bead] 23: /usr/bin/ceph-osd() [0x736e99] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 20/ 5 osd 0/ 5 optracker 0/ 5 objclass 20/20 filestore 1/ 3 keyvaluestore 20/20 journal 0/ 5 ms 1/ 5 mon 5/20 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 20/20 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.7.log --- end dump of recent events --- -- |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com