Re: Fwd: OSD crashes after upgrade to 0.80.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



it seems like a leveldb problem. could you just kick it out and add a
new osd to make cluster healthy firstly?

On Wed, Aug 12, 2015 at 1:31 AM, Gerd Jakobovitsch <gerd@xxxxxxxxxxxxx> wrote:
>
>
> Dear all,
>
> I run a ceph system with 4 nodes and ~80 OSDs using xfs, with currently 75%
> usage, running firefly. On friday I upgraded it from 0.80.8 to 0.80.10, and
> since then I got several OSDs crashing and never recovering: trying to run
> it, ends up crashing as follows.
>
> Is this problem known? Is there any configuration that should be checked?
> Any way to try to recover these OSDs without losing all data?
>
> After that, setting the OSD to lost, I got one incomplete, inactive PG. Is
> there any way to recover it? Data still exists in crashed OSDs.
>
> Regards.
>
> [(12:58:13) root@spcsnp3 ~]# service ceph start osd.7
> === osd.7 ===
> 2015-08-11 12:58:21.003876 7f17ed52b700  1 monclient(hunting): found
> mon.spcsmp2
> 2015-08-11 12:58:21.003915 7f17ef493700  5 monclient: authenticate success,
> global_id 206010466
> create-or-move updated item name 'osd.7' weight 3.64 at location
> {host=spcsnp3,root=default} to crush map
> Starting Ceph osd.7 on spcsnp3...
> 2015-08-11 12:58:21.279878 7f200fa8f780  0 ceph version 0.80.10
> (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 31918
> starting osd.7 at :/0 osd_data /var/lib/ceph/osd/ceph-7
> /var/lib/ceph/osd/ceph-7/journal
> [(12:58:21) root@spcsnp3 ~]# 2015-08-11 12:58:21.348094 7f200fa8f780 10
> filestore(/var/lib/ceph/osd/ceph-7) dump_stop
> 2015-08-11 12:58:21.348291 7f200fa8f780  5
> filestore(/var/lib/ceph/osd/ceph-7) basedir /var/lib/ceph/osd/ceph-7 journal
> /var/lib/ceph/osd/ceph-7/journal
> 2015-08-11 12:58:21.348326 7f200fa8f780 10
> filestore(/var/lib/ceph/osd/ceph-7) mount fsid is
> 54c136da-c51c-4799-b2dc-b7988982ee00
> 2015-08-11 12:58:21.349010 7f200fa8f780  0
> filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs)
> 2015-08-11 12:58:21.349026 7f200fa8f780  1
> filestore(/var/lib/ceph/osd/ceph-7)  disabling 'filestore replica fadvise'
> due to known issues with fadvise(DONTNEED) on xfs
> 2015-08-11 12:58:21.353277 7f200fa8f780  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP
> ioctl is supported and appears to work
> 2015-08-11 12:58:21.353302 7f200fa8f780  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2015-08-11 12:58:21.362106 7f200fa8f780  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features:
> syscall(SYS_syncfs, fd) fully supported
> 2015-08-11 12:58:21.362195 7f200fa8f780  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is
> disabled by conf
> 2015-08-11 12:58:21.362701 7f200fa8f780  5
> filestore(/var/lib/ceph/osd/ceph-7) mount op_seq is 35490995
> 2015-08-11 12:58:59.383179 7f200fa8f780 -1 *** Caught signal (Aborted) **
>  in thread 7f200fa8f780
>
>  ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
>  1: /usr/bin/ceph-osd() [0xab7562]
>  2: (()+0xf0a0) [0x7f200efcd0a0]
>  3: (gsignal()+0x35) [0x7f200db3f165]
>  4: (abort()+0x180) [0x7f200db423e0]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f200e39589d]
>  6: (()+0x63996) [0x7f200e393996]
>  7: (()+0x639c3) [0x7f200e3939c3]
>  8: (()+0x63bee) [0x7f200e393bee]
>  9: (tc_new()+0x48e) [0x7f200f213aee]
>  10: (std::string::_Rep::_S_create(unsigned long, unsigned long,
> std::allocator<char> const&)+0x59) [0x7f200e3ef999]
>  11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned
> long)+0x28) [0x7f200e3f0708]
>  12: (std::string::reserve(unsigned long)+0x30) [0x7f200e3f07f0]
>  13: (std::string::append(char const*, unsigned long)+0xb5) [0x7f200e3f0ab5]
>  14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2)
> [0x7f200f46ffa2]
>  15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*,
> unsigned long*)+0x180) [0x7f200f468360]
>  16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2)
> [0x7f200f46adf2]
>  17: (leveldb::DB::Open(leveldb::Options const&, std::string const&,
> leveldb::DB**)+0xff) [0x7f200f46b11f]
>  18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8]
>  19: (FileStore::mount()+0x18e0) [0x9b7080]
>  20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a]
>  21: (main()+0x2234) [0x7331c4]
>  22: (__libc_start_main()+0xfd) [0x7f200db2bead]
>  23: /usr/bin/ceph-osd() [0x736e99]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> --- begin dump of recent events ---
>    -66> 2015-08-11 12:58:21.277524 7f200fa8f780  5 asok(0x2800230)
> register_command perfcounters_dump hook 0x27f0010
>    -65> 2015-08-11 12:58:21.277552 7f200fa8f780  5 asok(0x2800230)
> register_command 1 hook 0x27f0010
>    -64> 2015-08-11 12:58:21.277556 7f200fa8f780  5 asok(0x2800230)
> register_command perf dump hook 0x27f0010
>    -63> 2015-08-11 12:58:21.277561 7f200fa8f780  5 asok(0x2800230)
> register_command perfcounters_schema hook 0x27f0010
>    -62> 2015-08-11 12:58:21.277564 7f200fa8f780  5 asok(0x2800230)
> register_command 2 hook 0x27f0010
>    -61> 2015-08-11 12:58:21.277566 7f200fa8f780  5 asok(0x2800230)
> register_command perf schema hook 0x27f0010
>    -60> 2015-08-11 12:58:21.277569 7f200fa8f780  5 asok(0x2800230)
> register_command config show hook 0x27f0010
>    -59> 2015-08-11 12:58:21.277573 7f200fa8f780  5 asok(0x2800230)
> register_command config set hook 0x27f0010
>    -58> 2015-08-11 12:58:21.277575 7f200fa8f780  5 asok(0x2800230)
> register_command config get hook 0x27f0010
>    -57> 2015-08-11 12:58:21.277578 7f200fa8f780  5 asok(0x2800230)
> register_command log flush hook 0x27f0010
>    -56> 2015-08-11 12:58:21.277581 7f200fa8f780  5 asok(0x2800230)
> register_command log dump hook 0x27f0010
>    -55> 2015-08-11 12:58:21.277583 7f200fa8f780  5 asok(0x2800230)
> register_command log reopen hook 0x27f0010
>    -54> 2015-08-11 12:58:21.279878 7f200fa8f780  0 ceph version 0.80.10
> (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 31918
>    -53> 2015-08-11 12:58:21.345764 7f200fa8f780  1 -- 10.17.0.7:0/0 learned
> my addr 10.17.0.7:0/0
>    -52> 2015-08-11 12:58:21.345778 7f200fa8f780  1 accepter.accepter.bind
> my_inst.addr is 10.17.0.7:6813/31918 need_addr=0
>    -51> 2015-08-11 12:58:21.345792 7f200fa8f780  1 -- 10.18.0.7:0/0 learned
> my addr 10.18.0.7:0/0
>    -50> 2015-08-11 12:58:21.345795 7f200fa8f780  1 accepter.accepter.bind
> my_inst.addr is 10.18.0.7:6808/31918 need_addr=0
>    -49> 2015-08-11 12:58:21.345805 7f200fa8f780  1 -- 10.18.0.7:0/0 learned
> my addr 10.18.0.7:0/0
>    -48> 2015-08-11 12:58:21.345809 7f200fa8f780  1 accepter.accepter.bind
> my_inst.addr is 10.18.0.7:6809/31918 need_addr=0
>    -47> 2015-08-11 12:58:21.345827 7f200fa8f780  1 -- 10.17.0.7:0/0 learned
> my addr 10.17.0.7:0/0
>    -46> 2015-08-11 12:58:21.345830 7f200fa8f780  1 accepter.accepter.bind
> my_inst.addr is 10.17.0.7:6824/31918 need_addr=0
>    -45> 2015-08-11 12:58:21.345847 7f200fa8f780  1 -- 10.17.0.7:0/0 learned
> my addr 10.17.0.7:0/0
>    -44> 2015-08-11 12:58:21.345851 7f200fa8f780  1 accepter.accepter.bind
> my_inst.addr is 10.17.0.7:6825/31918 need_addr=0
>    -43> 2015-08-11 12:58:21.346156 7f200fa8f780  1 finished
> global_init_daemonize
>    -42> 2015-08-11 12:58:21.348094 7f200fa8f780 10
> filestore(/var/lib/ceph/osd/ceph-7) dump_stop
>    -41> 2015-08-11 12:58:21.348119 7f200fa8f780  5 asok(0x2800230) init
> /var/run/ceph/ceph-osd.7.asok
>    -40> 2015-08-11 12:58:21.348134 7f200fa8f780  5 asok(0x2800230)
> bind_and_listen /var/run/ceph/ceph-osd.7.asok
>    -39> 2015-08-11 12:58:21.348232 7f200fa8f780  5 asok(0x2800230)
> register_command 0 hook 0x27ee0b0
>    -38> 2015-08-11 12:58:21.348242 7f200fa8f780  5 asok(0x2800230)
> register_command version hook 0x27ee0b0
>    -37> 2015-08-11 12:58:21.348246 7f200fa8f780  5 asok(0x2800230)
> register_command git_version hook 0x27ee0b0
>    -36> 2015-08-11 12:58:21.348250 7f200fa8f780  5 asok(0x2800230)
> register_command help hook 0x27f00b0
>    -35> 2015-08-11 12:58:21.348254 7f200fa8f780  5 asok(0x2800230)
> register_command get_command_descriptions hook 0x27f0150
>    -34> 2015-08-11 12:58:21.348278 7f200b749700  5 asok(0x2800230) entry
> start
>    -33> 2015-08-11 12:58:21.348291 7f200fa8f780  5
> filestore(/var/lib/ceph/osd/ceph-7) basedir /var/lib/ceph/osd/ceph-7 journal
> /var/lib/ceph/osd/ceph-7/journal
>    -32> 2015-08-11 12:58:21.348326 7f200fa8f780 10
> filestore(/var/lib/ceph/osd/ceph-7) mount fsid is
> 54c136da-c51c-4799-b2dc-b7988982ee00
>    -31> 2015-08-11 12:58:21.349010 7f200fa8f780  0
> filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs)
>    -30> 2015-08-11 12:58:21.349026 7f200fa8f780  1
> filestore(/var/lib/ceph/osd/ceph-7)  disabling 'filestore replica fadvise'
> due to known issues with fadvise(DONTNEED) on xfs
>    -29> 2015-08-11 12:58:21.353277 7f200fa8f780  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP
> ioctl is supported and appears to work
>    -28> 2015-08-11 12:58:21.353302 7f200fa8f780  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
>    -27> 2015-08-11 12:58:21.362106 7f200fa8f780  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features:
> syscall(SYS_syncfs, fd) fully supported
>    -26> 2015-08-11 12:58:21.362195 7f200fa8f780  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is
> disabled by conf
>    -25> 2015-08-11 12:58:21.362701 7f200fa8f780  5
> filestore(/var/lib/ceph/osd/ceph-7) mount op_seq is 35490995
>    -24> 2015-08-11 12:58:24.458593 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned
> 1164 bytes
>    -23> 2015-08-11 12:58:24.462824 7f200b749700  1 do_command 'config get'
> 'format:json var:fsid
>    -22> 2015-08-11 12:58:24.462850 7f200b749700  1 do_command 'config get'
> 'format:json var:fsid result is 47 bytes
>    -21> 2015-08-11 12:58:24.462853 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'config get' '' to 0x27f0010 returned 47 bytes
>    -20> 2015-08-11 12:58:24.463194 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned
> 1164 bytes
>    -19> 2015-08-11 12:58:24.467886 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'version' '' to 0x27ee0b0 returned 21 bytes
>    -18> 2015-08-11 12:58:34.118231 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned
> 1164 bytes
>    -17> 2015-08-11 12:58:34.122484 7f200b749700  1 do_command 'config get'
> 'format:json var:fsid
>    -16> 2015-08-11 12:58:34.122503 7f200b749700  1 do_command 'config get'
> 'format:json var:fsid result is 47 bytes
>    -15> 2015-08-11 12:58:34.122506 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'config get' '' to 0x27f0010 returned 47 bytes
>    -14> 2015-08-11 12:58:34.122739 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned
> 1164 bytes
>    -13> 2015-08-11 12:58:34.125503 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'version' '' to 0x27ee0b0 returned 21 bytes
>    -12> 2015-08-11 12:58:44.136424 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned
> 1164 bytes
>    -11> 2015-08-11 12:58:44.140286 7f200b749700  1 do_command 'config get'
> 'format:json var:fsid
>    -10> 2015-08-11 12:58:44.140304 7f200b749700  1 do_command 'config get'
> 'format:json var:fsid result is 47 bytes
>     -9> 2015-08-11 12:58:44.140309 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'config get' '' to 0x27f0010 returned 47 bytes
>     -8> 2015-08-11 12:58:44.140530 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned
> 1164 bytes
>     -7> 2015-08-11 12:58:44.143236 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'version' '' to 0x27ee0b0 returned 21 bytes
>     -6> 2015-08-11 12:58:54.493800 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned
> 1164 bytes
>     -5> 2015-08-11 12:58:54.497564 7f200b749700  1 do_command 'config get'
> 'format:json var:fsid
>     -4> 2015-08-11 12:58:54.497586 7f200b749700  1 do_command 'config get'
> 'format:json var:fsid result is 47 bytes
>     -3> 2015-08-11 12:58:54.497591 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'config get' '' to 0x27f0010 returned 47 bytes
>     -2> 2015-08-11 12:58:54.497905 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'get_command_descriptions' '' to 0x27f0150 returned
> 1164 bytes
>     -1> 2015-08-11 12:58:54.500762 7f200b749700  5 asok(0x2800230)
> AdminSocket: request 'version' '' to 0x27ee0b0 returned 21 bytes
>      0> 2015-08-11 12:58:59.383179 7f200fa8f780 -1 *** Caught signal
> (Aborted) **
>  in thread 7f200fa8f780
>
>  ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
>  1: /usr/bin/ceph-osd() [0xab7562]
>  2: (()+0xf0a0) [0x7f200efcd0a0]
>  3: (gsignal()+0x35) [0x7f200db3f165]
>  4: (abort()+0x180) [0x7f200db423e0]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f200e39589d]
>  6: (()+0x63996) [0x7f200e393996]
>  7: (()+0x639c3) [0x7f200e3939c3]
>  8: (()+0x63bee) [0x7f200e393bee]
>  9: (tc_new()+0x48e) [0x7f200f213aee]
>  10: (std::string::_Rep::_S_create(unsigned long, unsigned long,
> std::allocator<char> const&)+0x59) [0x7f200e3ef999]
>  11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned
> long)+0x28) [0x7f200e3f0708]
>  12: (std::string::reserve(unsigned long)+0x30) [0x7f200e3f07f0]
>  13: (std::string::append(char const*, unsigned long)+0xb5) [0x7f200e3f0ab5]
>  14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2)
> [0x7f200f46ffa2]
>  15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*,
> unsigned long*)+0x180) [0x7f200f468360]
>  16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2)
> [0x7f200f46adf2]
>  17: (leveldb::DB::Open(leveldb::Options const&, std::string const&,
> leveldb::DB**)+0xff) [0x7f200f46b11f]
>  18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8]
>  19: (FileStore::mount()+0x18e0) [0x9b7080]
>  20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a]
>  21: (main()+0x2234) [0x7331c4]
>  22: (__libc_start_main()+0xfd) [0x7f200db2bead]
>  23: /usr/bin/ceph-osd() [0x736e99]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>   20/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>   20/20 filestore
>    1/ 3 keyvaluestore
>   20/20 journal
>    0/ 5 ms
>    1/ 5 mon
>    5/20 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>   20/20 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.7.log
> --- end dump of recent events ---
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo
> sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução
> ou qualquer forma de utilização do teor deste documento depende de
> autorização do emissor, sujeitando-se o infrator às sanções legais. Caso
> esta comunicação tenha sido recebida por engano, favor avisar imediatamente,
> respondendo esta mensagem.
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux