I’ve moved a whole set of OSD’s from one server to another after a filure. All other OSD’s have come up no problems and Ceph is busy rebalancing itself after the change, but one of my OSD’s is being difficult. Below is the output from the log file when I use ‘service ceph restart osd.35’ to try and bring it online. I can see any usefull errors here to start chasing down, can you? -32> 2016-04-04 16:07:13.425771 7f28d457f880 2 journal read_entry 4048896000 : seq 283263182 137 bytes -31> 2016-04-04 16:07:13.425780 7f28d457f880 2 journal read_entry 4048900096 : seq 283263183 137 bytes -30> 2016-04-04 16:07:13.425791 7f28d457f880 2 journal No further valid entries found, journal is most likely valid -29> 2016-04-04 16:07:13.425802 7f28d457f880 2 journal No further valid entries found, journal is most likely valid -28> 2016-04-04 16:07:13.425809 7f28d457f880 3 journal journal_replay: end of journal, done. -27> 2016-04-04 16:07:13.431977 7f28d457f880 1 journal _open /var/lib/ceph/osd/ceph-35/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 -26> 2016-04-04 16:07:13.432317 7f28d457f880 2 osd.35 0 boot -25> 2016-04-04 16:07:13.462245 7f28d457f880 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello -24> 2016-04-04 16:07:13.463479 7f28d457f880 1 <cls> cls/log/cls_log.cc:312: Loaded log class! -23> 2016-04-04 16:07:13.468017 7f28d457f880 1 <cls> cls/refcount/cls_refcount.cc:231: Loaded refcount class! -22> 2016-04-04 16:07:13.468228 7f28d457f880 1 <cls> cls/replica_log/cls_replica_log.cc:141: Loaded replica log class! -21> 2016-04-04 16:07:13.472035 7f28d457f880 1 <cls> cls/rgw/cls_rgw.cc:3047: Loaded rgw class! -20> 2016-04-04 16:07:13.472480 7f28d457f880 1 <cls> cls/statelog/cls_statelog.cc:306: Loaded log class! -19> 2016-04-04 16:07:13.472890 7f28d457f880 1 <cls> cls/user/cls_user.cc:367: Loaded user class! -18> 2016-04-04 16:07:13.473231 7f28d457f880 1 <cls> cls/version/cls_version.cc:227: Loaded version class! -17> 2016-04-04 16:07:13.565117 7f28d457f880 0 osd.35 3612 crush map has features 1107558400, adjusting msgr requires for clients -16> 2016-04-04 16:07:13.565139 7f28d457f880 0 osd.35 3612 crush map has features 1107558400 was 8705, adjusting msgr requires for mons -15> 2016-04-04 16:07:13.565149 7f28d457f880 0 osd.35 3612 crush map has features 1107558400, adjusting msgr requires for osds -14> 2016-04-04 16:07:13.565171 7f28d457f880 0 osd.35 3612 load_pgs -13> 2016-04-04 16:07:36.965045 7f28d457f880 5 osd.35 pg_epoch: 3604 pg[34.2(unlocked)] enter Initial -12> 2016-04-04 16:07:36.965642 7f28d457f880 5 osd.35 pg_epoch: 3604 pg[34.2( empty local-les=0 n=0 ec=709 les/c 3578/3582 3588/3588/3542) [38,35] r=1 lpr=0 pi=3542-3587/1 crt=0'0 inactive NOTIFY] exit Initial 0.000598 0 0.000000 -11> 2016-04-04 16:07:36.965691 7f28d457f880 5 osd.35 pg_epoch: 3604 pg[34.2( empty local-les=0 n=0 ec=709 les/c 3578/3582 3588/3588/3542) [38,35] r=1 lpr=0 pi=3542-3587/1 crt=0'0 inactive NOTIFY] enter Reset -10> 2016-04-04 16:07:36.987602 7f28d457f880 5 osd.35 pg_epoch: 3587 pg[34.3(unlocked)] enter Initial -9> 2016-04-04 16:07:37.011658 7f28d457f880 5 osd.35 pg_epoch: 3587 pg[34.3( v 3334'2172 (0'0,3334'2172] local-les=3562 n=0 ec=709 les/c 3562/3587 3516/3516/3459) [35,34] r=0 lpr=0 crt=3334'2170 lcod 0'0 mlcod 0'0 inactive] exit Initial
0.024056 0 0.000000 -8> 2016-04-04 16:07:37.011696 7f28d457f880 5 osd.35 pg_epoch: 3587 pg[34.3( v 3334'2172 (0'0,3334'2172] local-les=3562 n=0 ec=709 les/c 3562/3587 3516/3516/3459) [35,34] r=0 lpr=0 crt=3334'2170 lcod 0'0 mlcod 0'0 inactive] enter Reset -7> 2016-04-04 16:07:37.067070 7f28d457f880 5 osd.35 pg_epoch: 3481 pg[34.7(unlocked)] enter Initial -6> 2016-04-04 16:07:37.097469 7f28d457f880 5 osd.35 pg_epoch: 3481 pg[34.7( v 3334'2635 (0'0,3334'2635] local-les=3465 n=0 ec=709 les/c 3465/3479 3459/3459/3369) [34,35] r=1 lpr=0 pi=3448-3458/1 crt=3334'2635 lcod 0'0 inactive NOTIFY]
exit Initial 0.030398 0 0.000000 -5> 2016-04-04 16:07:37.097531 7f28d457f880 5 osd.35 pg_epoch: 3481 pg[34.7( v 3334'2635 (0'0,3334'2635] local-les=3465 n=0 ec=709 les/c 3465/3479 3459/3459/3369) [34,35] r=1 lpr=0 pi=3448-3458/1 crt=3334'2635 lcod 0'0 inactive NOTIFY]
enter Reset -4> 2016-04-04 16:07:37.146741 7f28d457f880 5 osd.35 pg_epoch: 3582 pg[34.8(unlocked)] enter Initial -3> 2016-04-04 16:07:37.172346 7f28d457f880 5 osd.35 pg_epoch: 3582 pg[34.8( v 3334'2042 (0'0,3334'2042] lb 0//0//-1 local-les=3494 n=0 ec=709 les/c 3494/3502 3542/3542/3368) [33,38] r=-1 lpr=0 pi=3368-3541/4 crt=3334'2042 lcod 0'0
inactive NOTIFY] exit Initial 0.025604 0 0.000000 -2> 2016-04-04 16:07:37.172387 7f28d457f880 5 osd.35 pg_epoch: 3582 pg[34.8( v 3334'2042 (0'0,3334'2042] lb 0//0//-1 local-les=3494 n=0 ec=709 les/c 3494/3502 3542/3542/3368) [33,38] r=-1 lpr=0 pi=3368-3541/4 crt=3334'2042 lcod 0'0
inactive NOTIFY] enter Reset -1> 2016-04-04 16:07:37.189150 7f28d457f880 5 osd.35 pg_epoch: 3585 pg[34.9(unlocked)] enter Initial 0> 2016-04-04 16:07:37.268571 7f28d457f880 -1 *** Caught signal (Aborted) ** in thread 7f28d457f880 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) 1: /usr/bin/ceph-osd() [0xac5c32] 2: (()+0xf130) [0x7f28d2f0d130] 3: (gsignal()+0x37) [0x7f28d19275d7] 4: (abort()+0x148) [0x7f28d1928cc8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f28d222b9b5] 6: (()+0x5e926) [0x7f28d2229926] 7: (()+0x5e953) [0x7f28d2229953] 8: (()+0x5eb73) [0x7f28d2229b73] 9: (pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x230) [0x792510] 10: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char,
std::char_traits<char>, std::allocator<char> >&, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*)+0xa2f) [0x76b9bf] 11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x34f) [0x7edebf] 12: (OSD::load_pgs()+0xa7a) [0x6b6a8a] 13: (OSD::init()+0x729) [0x6b9319] 14: (main()+0x27f3) [0x643ed3] 15: (__libc_start_main()+0xf5) [0x7f28d1913af5] 16: /usr/bin/ceph-osd() [0x65d139] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.35.log --- end dump of recent events --- [root@cephosd4 ~]# |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com