Re: Core dump when running OSD service

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,

Thank you for your suggestion. Unfortunately I did not understand what was involved and in the process of trying to figure it out I think I made it worse. Thankfully it's just a test environment so I just rebuilt all the Ceph servers involved and how it's working.

Regards,
James

On Fri, Oct 23, 2015 at 11:18 AM, David Zafman <dzafman@xxxxxxxxxx> wrote:
I was focused on fixing the OSD, but you need to determine if some misconfiguration or hardware issue caused a filesystem corruption. David On 10/22/15 3:08 PM, David Zafman wrote:
There is a corruption of the osdmaps on this particular OSD. You need determine which maps are bad probably by bumping the osd debug level to 20. Then transfer them from a working OSD. The newest ceph-objectstore-tool has features to write the maps, but you'll need to build a version based on a v0.94.4 source tree. I don't know if you can just copy files with names like "current/meta/osdmap.8__0_FD6E4D61__none" (map for epoch 8) between OSDs. David On 10/21/15 8:54 PM, James O'Neill wrote:
I have an OSD that didn't come up after a reboot. I was getting the error show below. it was running 0.94.3 so I reinstalled all packages. I then upgraded everything to 0.94.4 hoping that would fix it but it hasn't. There are three OSDs, this is the only one having problems (it also contains the inconsistent pgs). Can anyone tell me what the problem might be? root@dbp-ceph03:/srv/data# ceph status cluster 4f6fb784-bd17-4105-a689-e8d1b4bc5643 health HEALTH_ERR 53 pgs inconsistent 542 pgs stale 542 pgs stuck stale 5 requests are blocked > 32 sec 85 scrub errors too many PGs per OSD (544 > max 300) noout flag(s) set monmap e3: 3 mons at {dbp-ceph01=172.17.241.161:6789/0,dbp-ceph02=172.17.241.162:6789/0,dbp-ceph03=172.17.241.163:6789/0} election epoch 52, quorum 0,1,2 dbp-ceph01,dbp-ceph02,dbp-ceph03 osdmap e107: 2 osds: 2 up, 2 in flags noout pgmap v65678: 1088 pgs, 9 pools, 55199 kB data, 173 objects 2265 MB used, 16580 MB / 19901 MB avail 546 active+clean 489 stale+active+clean 53 stale+active+clean+inconsistent root@dbp-ceph02:~# /usr/bin/ceph-osd --cluster=ceph -i 1 -d 2015-10-22 14:15:48.312507 7f4edabec900 0 ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a), process ceph-osd, pid 31215 starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal 2015-10-22 14:15:48.352013 7f4edabec900 0 filestore(/var/lib/ceph/osd/ceph-1) backend generic (magic 0xef53) 2015-10-22 14:15:48.355621 7f4edabec900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is supported and appears to work 2015-10-22 14:15:48.355655 7f4edabec900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-10-22 14:15:48.362016 7f4edabec900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-10-22 14:15:48.372819 7f4edabec900 0 filestore(/var/lib/ceph/osd/ceph-1) limited size xattrs 2015-10-22 14:15:48.387002 7f4edabec900 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-10-22 14:15:48.394002 7f4edabec900 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-10-22 14:15:48.397803 7f4edabec900 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello terminate called after throwing an instance of 'ceph::buffer::end_of_buffer' what(): buffer::end_of_buffer *** Caught signal (Aborted) ** in thread 7f4edabec900 ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a) 1: /usr/bin/ceph-osd() [0xacd94a] 2: (()+0x10340) [0x7f4ed98a1340] 3: (gsignal()+0x39) [0x7f4ed7d3fcc9] 4: (abort()+0x148) [0x7f4ed7d430d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f4ed864b6b5] 6: (()+0x5e836) [0x7f4ed8649836] 7: (()+0x5e863) [0x7f4ed8649863] 8: (()+0x5eaa2) [0x7f4ed8649aa2] 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc35ef7] 10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x6d) [0xb834ed] 11: (OSDMap::decode(ceph::buffer::list&)+0x3f) [0xb8560f] 12: (OSDService::try_get_map(unsigned int)+0x530) [0x6ac2c0] 13: (OSDService::get_map(unsigned int)+0xe) [0x70ad2e] 14: (OSD::init()+0x6ad) [0x6c5e0d] 15: (main()+0x2860) [0x6527e0] 16: (__libc_start_main()+0xf5) [0x7f4ed7d2aec5] 17: /usr/bin/ceph-osd() [0x66b887] 2015-10-22 14:15:48.412520 7f4edabec900 -1 *** Caught signal (Aborted) ** in thread 7f4edabec900 ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a) 1: /usr/bin/ceph-osd() [0xacd94a] 2: (()+0x10340) [0x7f4ed98a1340] 3: (gsignal()+0x39) [0x7f4ed7d3fcc9] 4: (abort()+0x148) [0x7f4ed7d430d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f4ed864b6b5] 6: (()+0x5e836) [0x7f4ed8649836] 7: (()+0x5e863) [0x7f4ed8649863] 8: (()+0x5eaa2) [0x7f4ed8649aa2] 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc35ef7] 10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x6d) [0xb834ed] 11: (OSDMap::decode(ceph::buffer::list&)+0x3f) [0xb8560f] 12: (OSDService::try_get_map(unsigned int)+0x530) [0x6ac2c0] 13: (OSDService::get_map(unsigned int)+0xe) [0x70ad2e] 14: (OSD::init()+0x6ad) [0x6c5e0d] 15: (main()+0x2860) [0x6527e0] 16: (__libc_start_main()+0xf5) [0x7f4ed7d2aec5] 17: /usr/bin/ceph-osd() [0x66b887] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -61> 2015-10-22 14:15:48.308047 7f4edabec900 5 asok(0x5648000) register_command perfcounters_dump hook 0x55e8050 -60> 2015-10-22 14:15:48.308138 7f4edabec900 5 asok(0x5648000) register_command 1 hook 0x55e8050 -59> 2015-10-22 14:15:48.308164 7f4edabec900 5 asok(0x5648000) register_command perf dump hook 0x55e8050 -58> 2015-10-22 14:15:48.308181 7f4edabec900 5 asok(0x5648000) register_command perfcounters_schema hook 0x55e8050 -57> 2015-10-22 14:15:48.308192 7f4edabec900 5 asok(0x5648000) register_command 2 hook 0x55e8050 -56> 2015-10-22 14:15:48.308198 7f4edabec900 5 asok(0x5648000) register_command perf schema hook 0x55e8050 -55> 2015-10-22 14:15:48.308223 7f4edabec900 5 asok(0x5648000) register_command perf reset hook 0x55e8050 -54> 2015-10-22 14:15:48.308242 7f4edabec900 5 asok(0x5648000) register_command config show hook 0x55e8050 -53> 2015-10-22 14:15:48.308249 7f4edabec900 5 asok(0x5648000) register_command config set hook 0x55e8050 -52> 2015-10-22 14:15:48.308254 7f4edabec900 5 asok(0x5648000) register_command config get hook 0x55e8050 -51> 2015-10-22 14:15:48.308259 7f4edabec900 5 asok(0x5648000) register_command config diff hook 0x55e8050 -50> 2015-10-22 14:15:48.308263 7f4edabec900 5 asok(0x5648000) register_command log flush hook 0x55e8050 -49> 2015-10-22 14:15:48.308268 7f4edabec900 5 asok(0x5648000) register_command log dump hook 0x55e8050 -48> 2015-10-22 14:15:48.308274 7f4edabec900 5 asok(0x5648000) register_command log reopen hook 0x55e8050 -47> 2015-10-22 14:15:48.312507 7f4edabec900 0 ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a), process ceph-osd, pid 31215 -46> 2015-10-22 14:15:48.313730 7f4edabec900 1 -- 172.17.241.162:0/0 learned my addr 172.17.241.162:0/0 -45> 2015-10-22 14:15:48.313762 7f4edabec900 1 accepter.accepter.bind my_inst.addr is 172.17.241.162:6800/31215 need_addr=0 -44> 2015-10-22 14:15:48.313795 7f4edabec900 1 -- 172.17.241.162:0/0 learned my addr 172.17.241.162:0/0 -43> 2015-10-22 14:15:48.313803 7f4edabec900 1 accepter.accepter.bind my_inst.addr is 172.17.241.162:6801/31215 need_addr=0 -42> 2015-10-22 14:15:48.313825 7f4edabec900 1 -- 172.17.241.162:0/0 learned my addr 172.17.241.162:0/0 -41> 2015-10-22 14:15:48.313832 7f4edabec900 1 accepter.accepter.bind my_inst.addr is 172.17.241.162:6802/31215 need_addr=0 -40> 2015-10-22 14:15:48.313855 7f4edabec900 1 -- 172.17.241.162:0/0 learned my addr 172.17.241.162:0/0 -39> 2015-10-22 14:15:48.313863 7f4edabec900 1 accepter.accepter.bind my_inst.addr is 172.17.241.162:6803/31215 need_addr=0 -38> 2015-10-22 14:15:48.317379 7f4edabec900 5 asok(0x5648000) init /var/run/ceph/ceph-osd.1.asok -37> 2015-10-22 14:15:48.317419 7f4edabec900 5 asok(0x5648000) bind_and_listen /var/run/ceph/ceph-osd.1.asok -36> 2015-10-22 14:15:48.317480 7f4edabec900 5 asok(0x5648000) register_command 0 hook 0x55e40a8 -35> 2015-10-22 14:15:48.317502 7f4edabec900 5 asok(0x5648000) register_command version hook 0x55e40a8 -34> 2015-10-22 14:15:48.317508 7f4edabec900 5 asok(0x5648000) register_command git_version hook 0x55e40a8 -33> 2015-10-22 14:15:48.317515 7f4edabec900 5 asok(0x5648000) register_command help hook 0x55e8140 -32> 2015-10-22 14:15:48.317520 7f4edabec900 5 asok(0x5648000) register_command get_command_descriptions hook 0x55e8130 -31> 2015-10-22 14:15:48.317624 7f4edabec900 10 monclient(hunting): build_initial_monmap -30> 2015-10-22 14:15:48.317654 7f4ed44f2700 5 asok(0x5648000) entry start -29> 2015-10-22 14:15:48.350458 7f4edabec900 5 adding auth protocol: none -28> 2015-10-22 14:15:48.350522 7f4edabec900 5 adding auth protocol: none -27> 2015-10-22 14:15:48.350815 7f4edabec900 5 asok(0x5648000) register_command objecter_requests hook 0x55e8230 -26> 2015-10-22 14:15:48.351004 7f4edabec900 5 filestore(/var/lib/ceph/osd/ceph-1) test_mount basedir /var/lib/ceph/osd/ceph-1 journal /var/lib/ceph/osd/ceph-1/journal -25> 2015-10-22 14:15:48.351200 7f4edabec900 1 -- 172.17.241.162:6800/31215 messenger.start -24> 2015-10-22 14:15:48.351333 7f4edabec900 1 -- :/0 messenger.start -23> 2015-10-22 14:15:48.351404 7f4edabec900 1 -- 172.17.241.162:6803/31215 messenger.start -22> 2015-10-22 14:15:48.351473 7f4edabec900 1 -- 172.17.241.162:6802/31215 messenger.start -21> 2015-10-22 14:15:48.351537 7f4edabec900 1 -- 172.17.241.162:6801/31215 messenger.start -20> 2015-10-22 14:15:48.351599 7f4edabec900 1 -- :/0 messenger.start -19> 2015-10-22 14:15:48.351832 7f4edabec900 2 osd.1 0 mounting /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal -18> 2015-10-22 14:15:48.351874 7f4edabec900 5 filestore(/var/lib/ceph/osd/ceph-1) basedir /var/lib/ceph/osd/ceph-1 journal /var/lib/ceph/osd/ceph-1/journal -17> 2015-10-22 14:15:48.352013 7f4edabec900 0 filestore(/var/lib/ceph/osd/ceph-1) backend generic (magic 0xef53) -16> 2015-10-22 14:15:48.355621 7f4edabec900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is supported and appears to work -15> 2015-10-22 14:15:48.355655 7f4edabec900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option -14> 2015-10-22 14:15:48.362016 7f4edabec900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) -13> 2015-10-22 14:15:48.372819 7f4edabec900 0 filestore(/var/lib/ceph/osd/ceph-1) limited size xattrs -12> 2015-10-22 14:15:48.373025 7f4edabec900 5 filestore(/var/lib/ceph/osd/ceph-1) mount op_seq is 128790 -11> 2015-10-22 14:15:48.387002 7f4edabec900 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled -10> 2015-10-22 14:15:48.394002 7f4edabec900 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway -9> 2015-10-22 14:15:48.395535 7f4edabec900 2 osd.1 0 boot -8> 2015-10-22 14:15:48.397803 7f4edabec900 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello -7> 2015-10-22 14:15:48.398072 7f4edabec900 1 <cls> cls/statelog/cls_statelog.cc:306: Loaded log class! -6> 2015-10-22 14:15:48.398603 7f4edabec900 1 <cls> cls/user/cls_user.cc:367: Loaded user class! -5> 2015-10-22 14:15:48.398855 7f4edabec900 1 <cls> cls/replica_log/cls_replica_log.cc:141: Loaded replica log class! -4> 2015-10-22 14:15:48.399120 7f4edabec900 1 <cls> cls/log/cls_log.cc:312: Loaded log class! -3> 2015-10-22 14:15:48.404859 7f4edabec900 1 <cls> cls/refcount/cls_refcount.cc:231: Loaded refcount class! -2> 2015-10-22 14:15:48.408976 7f4edabec900 1 <cls> cls/rgw/cls_rgw.cc:3047: Loaded rgw class! -1> 2015-10-22 14:15:48.409169 7f4edabec900 1 <cls> cls/version/cls_version.cc:227: Loaded version class! 0> 2015-10-22 14:15:48.412520 7f4edabec900 -1 *** Caught signal (Aborted) ** in thread 7f4edabec900 ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a) 1: /usr/bin/ceph-osd() [0xacd94a] 2: (()+0x10340) [0x7f4ed98a1340] 3: (gsignal()+0x39) [0x7f4ed7d3fcc9] 4: (abort()+0x148) [0x7f4ed7d430d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f4ed864b6b5] 6: (()+0x5e836) [0x7f4ed8649836] 7: (()+0x5e863) [0x7f4ed8649863] 8: (()+0x5eaa2) [0x7f4ed8649aa2] 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc35ef7] 10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x6d) [0xb834ed] 11: (OSDMap::decode(ceph::buffer::list&)+0x3f) [0xb8560f] 12: (OSDService::try_get_map(unsigned int)+0x530) [0x6ac2c0] 13: (OSDService::get_map(unsigned int)+0xe) [0x70ad2e] 14: (OSD::init()+0x6ad) [0x6c5e0d] 15: (main()+0x2860) [0x6527e0] 16: (__libc_start_main()+0xf5) [0x7f4ed7d2aec5] 17: /usr/bin/ceph-osd() [0x66b887] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 0/ 5 filestore 1/ 3 keyvaluestore 0/ 0 journal 0/ 5 ms 1/ 5 mon 0/20 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 10000 max_new 1000 log_file --- end dump of recent events --- Aborted (core dumped) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux