after jewel 10.2.2->10.2.7 upgrade, one of OSD crashes on OSDMap::decode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've ugpraded tiny jewel cluster from 10.2.2 to 10.2.7 and now
one of OSDs fails to start..

here's (hopefully) important part of the backtrace:
2017-05-01 19:54:17.627262 7fb2bbf78800 10 filestore(/var/lib/ceph/osd/ceph-1) stat meta/#-1:c0371625:::snapmapper:0# = 0 (size 0)
2017-05-01 19:54:17.627440 7fb2bbf78800  0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello
2017-05-01 19:54:17.629044 7fb2bbf78800  0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
2017-05-01 19:54:17.630656 7fb2bbf78800 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/#-1:3294e826:::osdmap.53:0# 0~0
2017-05-01 19:54:17.630674 7fb2bbf78800 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read meta/#-1:3294e826:::osdmap.53:0# 0~0/0
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
  what():  buffer::end_of_buffer
*** Caught signal (Aborted) **
 in thread 7fb2bbf78800 thread_name:ceph-osd
 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x91d8ea) [0x5609e9f938ea]
 2: (()+0xf370) [0x7fb2ba6ca370]
 3: (gsignal()+0x37) [0x7fb2b8c8b1d7]
 4: (abort()+0x148) [0x7fb2b8c8c8c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fb2b958f9d5]
 6: (()+0x5e946) [0x7fb2b958d946]
 7: (()+0x5e973) [0x7fb2b958d973]
 8: (()+0x5eb93) [0x7fb2b958db93]
 9: (ceph::buffer::list::iterator_impl<false>::copy(unsigned int, char*)+0xa5) [0x5609ea09e425]
 10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x6d) [0x5609ea055a9d]
 11: (OSDMap::decode(ceph::buffer::list&)+0x2e) [0x5609ea056d9e]
 12: (OSDService::try_get_map(unsigned int)+0x4ac) [0x5609e9a0882c]
 13: (OSDService::get_map(unsigned int)+0xe) [0x5609e9a6b5fe]
 14: (OSD::init()+0x1fe2) [0x5609e9a1e782]
 15: (main()+0x2c55) [0x5609e9981dc5]
 16: (__libc_start_main()+0xf5) [0x7fb2b8c77b35]
 17: (()+0x3561e7) [0x5609e99cc1e7]
2017-05-01 19:54:17.632871 7fb2bbf78800 -1 *** Caught signal (Aborted) **
 in thread 7fb2bbf78800 thread_name:ceph-osd

full osd log is here:

http://nik.lbox.cz/download/osd-crash.txt

I've found some older discussions and reports of similar problem, but
none of current versions, especially 10.2.7

the cluster is very small (just 2+2 OSDs, 3 mons, no MDS), was installed as
10.2.2, therefore no upgrade from hammer or so.. OS is centos7 based, 4.4.52
x86_64 kernel..

If anyone is interested in it, I can provide more info if needed, otherwise
I'll reformat OSD to get it back into OK state..

BR

nik


-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@xxxxxxxxxxx
-------------------------------------
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux