OSD crashes when starting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,

I got to an unrecoverable crash at one specific OSD, every time I try to restart it. It happened first at firefly 0.80.8, I updated to 0.80.10, but it continued to happen.

Due to this failure, I have several PGs down+peering, that won't recover even marking the OSD out.

Could someone help me? Is it possible to edit/rebuild the leveldb-based log that seems to be causing the problem?

Here is what the logfile informs me:

[(12:54:45) root@spcsnp2 ~]# service ceph start osd.31
=== osd.31 ===
create-or-move updated item name 'osd.31' weight 2.73 at location {host=spcsnp2,root=default} to crush map
Starting Ceph osd.31 on spcsnp2...
starting osd.31 at :/0 osd_data /var/lib/ceph/osd/ceph-31 /var/lib/ceph/osd/ceph-31/journal
2015-08-07 12:55:12.916880 7fd614c8f780  0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 23260
[(12:55:12) root@spcsnp2 ~]# 2015-08-07 12:55:12.928614 7fd614c8f780  0 filestore(/var/lib/ceph/osd/ceph-31) mount detected xfs (libxfs)
2015-08-07 12:55:12.928622 7fd614c8f780  1 filestore(/var/lib/ceph/osd/ceph-31)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-08-07 12:55:12.931410 7fd614c8f780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is supported and appears to work
2015-08-07 12:55:12.931419 7fd614c8f780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-08-07 12:55:12.939290 7fd614c8f780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: syscall(SYS_syncfs, fd) fully supported
2015-08-07 12:55:12.939326 7fd614c8f780  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_feature: extsize is disabled by conf
2015-08-07 12:55:45.587019 7fd614c8f780 -1 *** Caught signal (Aborted) **
 in thread 7fd614c8f780

 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
 1: /usr/bin/ceph-osd() [0xab7562]
 2: (()+0xf030) [0x7fd6141ce030]
 3: (gsignal()+0x35) [0x7fd612d41475]
 4: (abort()+0x180) [0x7fd612d446f0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd61359689d]
 6: (()+0x63996) [0x7fd613594996]
 7: (()+0x639c3) [0x7fd6135949c3]
 8: (()+0x63bee) [0x7fd613594bee]
 9: (tc_new()+0x48e) [0x7fd614414aee]
 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7fd6135f0999]
 11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x28) [0x7fd6135f1708]
 12: (std::string::reserve(unsigned long)+0x30) [0x7fd6135f17f0]
 13: (std::string::append(char const*, unsigned long)+0xb5) [0x7fd6135f1ab5]
 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7fd614670fa2]
 15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*, unsigned long*)+0x180) [0x7fd614669360]
 16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2) [0x7fd61466bdf2]
 17: (leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)+0xff) [0x7fd61466c11f]
 18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8]
 19: (FileStore::mount()+0x18e0) [0x9b7080]
 20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a]
 21: (main()+0x2234) [0x7331c4]
 22: (__libc_start_main()+0xfd) [0x7fd612d2dead]
 23: /usr/bin/ceph-osd() [0x736e99]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -56> 2015-08-07 12:55:12.915675 7fd614c8f780  5 asok(0x1a20230) register_command perfcounters_dump hook 0x1a10010
   -55> 2015-08-07 12:55:12.915697 7fd614c8f780  5 asok(0x1a20230) register_command 1 hook 0x1a10010
   -54> 2015-08-07 12:55:12.915700 7fd614c8f780  5 asok(0x1a20230) register_command perf dump hook 0x1a10010
   -53> 2015-08-07 12:55:12.915704 7fd614c8f780  5 asok(0x1a20230) register_command perfcounters_schema hook 0x1a10010
   -52> 2015-08-07 12:55:12.915706 7fd614c8f780  5 asok(0x1a20230) register_command 2 hook 0x1a10010
   -51> 2015-08-07 12:55:12.915709 7fd614c8f780  5 asok(0x1a20230) register_command perf schema hook 0x1a10010
   -50> 2015-08-07 12:55:12.915711 7fd614c8f780  5 asok(0x1a20230) register_command config show hook 0x1a10010
   -49> 2015-08-07 12:55:12.915714 7fd614c8f780  5 asok(0x1a20230) register_command config set hook 0x1a10010
   -48> 2015-08-07 12:55:12.915716 7fd614c8f780  5 asok(0x1a20230) register_command config get hook 0x1a10010
   -47> 2015-08-07 12:55:12.915718 7fd614c8f780  5 asok(0x1a20230) register_command log flush hook 0x1a10010
   -46> 2015-08-07 12:55:12.915721 7fd614c8f780  5 asok(0x1a20230) register_command log dump hook 0x1a10010
   -45> 2015-08-07 12:55:12.915723 7fd614c8f780  5 asok(0x1a20230) register_command log reopen hook 0x1a10010
   -44> 2015-08-07 12:55:12.916880 7fd614c8f780  0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 23260
   -43> 2015-08-07 12:55:12.918156 7fd614c8f780  1 -- 10.17.0.6:0/0 learned my addr 10.17.0.6:0/0
   -42> 2015-08-07 12:55:12.918164 7fd614c8f780  1 accepter.accepter.bind my_inst.addr is 10.17.0.6:6812/23260 need_addr=0
   -41> 2015-08-07 12:55:12.918178 7fd614c8f780  1 -- 10.18.0.6:0/0 learned my addr 10.18.0.6:0/0
   -40> 2015-08-07 12:55:12.918180 7fd614c8f780  1 accepter.accepter.bind my_inst.addr is 10.18.0.6:6810/23260 need_addr=0
   -39> 2015-08-07 12:55:12.918191 7fd614c8f780  1 -- 10.18.0.6:0/0 learned my addr 10.18.0.6:0/0
   -38> 2015-08-07 12:55:12.918192 7fd614c8f780  1 accepter.accepter.bind my_inst.addr is 10.18.0.6:6811/23260 need_addr=0
   -37> 2015-08-07 12:55:12.918202 7fd614c8f780  1 -- 10.17.0.6:0/0 learned my addr 10.17.0.6:0/0
   -36> 2015-08-07 12:55:12.918204 7fd614c8f780  1 accepter.accepter.bind my_inst.addr is 10.17.0.6:6815/23260 need_addr=0
   -35> 2015-08-07 12:55:12.918214 7fd614c8f780  1 -- 10.17.0.6:0/0 learned my addr 10.17.0.6:0/0
   -34> 2015-08-07 12:55:12.918216 7fd614c8f780  1 accepter.accepter.bind my_inst.addr is 10.17.0.6:6816/23260 need_addr=0
   -33> 2015-08-07 12:55:12.925154 7fd614c8f780  1 finished global_init_daemonize
   -32> 2015-08-07 12:55:12.927746 7fd614c8f780  5 asok(0x1a20230) init /var/run/ceph/ceph-osd.31.asok
   -31> 2015-08-07 12:55:12.927760 7fd614c8f780  5 asok(0x1a20230) bind_and_listen /var/run/ceph/ceph-osd.31.asok
   -30> 2015-08-07 12:55:12.927828 7fd614c8f780  5 asok(0x1a20230) register_command 0 hook 0x1a0e0b0
   -29> 2015-08-07 12:55:12.927837 7fd614c8f780  5 asok(0x1a20230) register_command version hook 0x1a0e0b0
   -28> 2015-08-07 12:55:12.927840 7fd614c8f780  5 asok(0x1a20230) register_command git_version hook 0x1a0e0b0
   -27> 2015-08-07 12:55:12.927843 7fd614c8f780  5 asok(0x1a20230) register_command help hook 0x1a100b0
   -26> 2015-08-07 12:55:12.927845 7fd614c8f780  5 asok(0x1a20230) register_command get_command_descriptions hook 0x1a10150
   -25> 2015-08-07 12:55:12.927861 7fd61094c700  5 asok(0x1a20230) entry start
   -24> 2015-08-07 12:55:12.928614 7fd614c8f780  0 filestore(/var/lib/ceph/osd/ceph-31) mount detected xfs (libxfs)
   -23> 2015-08-07 12:55:12.928622 7fd614c8f780  1 filestore(/var/lib/ceph/osd/ceph-31)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
   -22> 2015-08-07 12:55:12.931410 7fd614c8f780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is supported and appears to work
   -21> 2015-08-07 12:55:12.931419 7fd614c8f780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
   -20> 2015-08-07 12:55:12.939290 7fd614c8f780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: syscall(SYS_syncfs, fd) fully supported
   -19> 2015-08-07 12:55:12.939326 7fd614c8f780  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_feature: extsize is disabled by conf
   -18> 2015-08-07 12:55:16.785686 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes
   -17> 2015-08-07 12:55:16.788515 7fd61094c700  1 do_command 'config get' 'format:json var:fsid
   -16> 2015-08-07 12:55:16.788546 7fd61094c700  1 do_command 'config get' 'format:json var:fsid result is 47 bytes
   -15> 2015-08-07 12:55:16.788549 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'config get' '' to 0x1a10010 returned 47 bytes
   -14> 2015-08-07 12:55:16.788748 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes
   -13> 2015-08-07 12:55:16.790540 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'version' '' to 0x1a0e0b0 returned 21 bytes
   -12> 2015-08-07 12:55:26.022803 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes
   -11> 2015-08-07 12:55:26.025710 7fd61094c700  1 do_command 'config get' 'format:json var:fsid
   -10> 2015-08-07 12:55:26.025725 7fd61094c700  1 do_command 'config get' 'format:json var:fsid result is 47 bytes
    -9> 2015-08-07 12:55:26.025727 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'config get' '' to 0x1a10010 returned 47 bytes
    -8> 2015-08-07 12:55:26.025883 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes
    -7> 2015-08-07 12:55:26.027690 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'version' '' to 0x1a0e0b0 returned 21 bytes
    -6> 2015-08-07 12:55:36.291878 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes
    -5> 2015-08-07 12:55:36.294711 7fd61094c700  1 do_command 'config get' 'format:json var:fsid
    -4> 2015-08-07 12:55:36.294729 7fd61094c700  1 do_command 'config get' 'format:json var:fsid result is 47 bytes
    -3> 2015-08-07 12:55:36.294732 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'config get' '' to 0x1a10010 returned 47 bytes
    -2> 2015-08-07 12:55:36.294936 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes
    -1> 2015-08-07 12:55:36.296827 7fd61094c700  5 asok(0x1a20230) AdminSocket: request 'version' '' to 0x1a0e0b0 returned 21 bytes
     0> 2015-08-07 12:55:45.587019 7fd614c8f780 -1 *** Caught signal (Aborted) **
 in thread 7fd614c8f780

 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
 1: /usr/bin/ceph-osd() [0xab7562]
 2: (()+0xf030) [0x7fd6141ce030]
 3: (gsignal()+0x35) [0x7fd612d41475]
 4: (abort()+0x180) [0x7fd612d446f0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd61359689d]
 6: (()+0x63996) [0x7fd613594996]
 7: (()+0x639c3) [0x7fd6135949c3]
 8: (()+0x63bee) [0x7fd613594bee]
 9: (tc_new()+0x48e) [0x7fd614414aee]
 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7fd6135f0999]
 11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x28) [0x7fd6135f1708]
 12: (std::string::reserve(unsigned long)+0x30) [0x7fd6135f17f0]
 13: (std::string::append(char const*, unsigned long)+0xb5) [0x7fd6135f1ab5]
 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7fd614670fa2]
 15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*, unsigned long*)+0x180) [0x7fd614669360]
 16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2) [0x7fd61466bdf2]
 17: (leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)+0xff) [0x7fd61466c11f]
 18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8]
 19: (FileStore::mount()+0x18e0) [0x9b7080]
 20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a]
 21: (main()+0x2234) [0x7331c4]
 22: (__libc_start_main()+0xfd) [0x7fd612d2dead]
 23: /usr/bin/ceph-osd() [0x736e99]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.31.log
--- end dump of recent events ---

--

 

 

 

 

 

 

 

 


--

As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou qualquer forma de utilização do teor deste documento depende de autorização do emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux