If you are aware of the problematic PGs and they are exportable, then ceph-objectstore-tool is a viable solution. If not, then running gdb and/or higher debug osd level logs may prove useful (to understand more about the problem or collect info to ask for more in ceph-devel). On 13 September 2016 at 17:26, Henrik Korkuc <lists@xxxxxxxxx> wrote: > On 16-09-13 11:13, Ronny Aasen wrote: >> >> I suspect this must be a difficult question since there have been no >> replies on irc or mailinglist. >> >> assuming it's impossible to get these osd's running again. >> >> Is there a way to recover objects from the disks. ? they are mounted and >> data is readable. I have pg's down since they want to probe these osd's that >> do not want to start. >> >> pg query claim it can continue if i mark the osd as lost. but i would >> prefer to not loose data. especially since the data is ok and readable on >> the nonfunctioning osd. >> >> also let me know if there is other debug i can extract in order to >> troubleshoot the non starting osd's >> >> kind regards >> Ronny Aasen >> >> > I cannot help you with this, but you can try using > http://ceph.com/community/incomplete-pgs-oh-my/ and > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000238.html > (found this mail thread googling for the objectool post). ymmv > > >> >> >> >> On 12. sep. 2016 13:16, Ronny Aasen wrote: >>> >>> after adding more osd's and having a big backfill running 2 of my osd's >>> keep on stopping. >>> >>> We also recently upgraded from 0.94.7 to 0.94.9 but i do not know if >>> that is related. >>> >>> the log say. >>> >>> 0> 2016-09-12 10:31:08.288858 7f8749125880 -1 osd/PGLog.cc: In >>> function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t, >>> ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&, >>> PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, >>> std::set<std::basic_string<char> >*)' thread 7f8749125880 time >>> 2016-09-12 10:31:08.286337 >>> osd/PGLog.cc: 984: FAILED assert(oi.version == i->first) >>> >>> googeling led me to a bug that seems to be related to infernalis only. >>> dmesg does not show anything wrong with the hardware. >>> >>> this is debian running hammer 0.94.9 >>> and the osd is a software raid5 consisting of 5 3TB harddrives. >>> journal is a partition on ssd intel 3500 >>> >>> anyone have a clue to what can be wrong ? >>> >>> kind regrads >>> Ronny Aasen >>> >>> >>> >>> >>> >>> -- log debug_filestore=10 -- >>> -19> 2016-09-12 10:31:08.070947 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/1df4bfdd/rb.0.392c.238e1f29.0000002bd134/head '_' = 266 >>> -18> 2016-09-12 10:31:08.083111 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/deb5bfdd/rb.0.392c.238e1f29.0000002bc596/head '_' = 266 >>> -17> 2016-09-12 10:31:08.096718 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/9be5dfdd/rb.0.392c.238e1f29.0000002bc2bf/head '_' = 266 >>> -16> 2016-09-12 10:31:08.110048 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/cbf8ffdd/rb.0.392c.238e1f29.0000002b9d89/head '_' = 266 >>> -15> 2016-09-12 10:31:08.126263 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.0000002b078e/head '_' = 266 >>> -14> 2016-09-12 10:31:08.150199 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.0000002b078e/22 '_' = 259 >>> -13> 2016-09-12 10:31:08.173223 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.0000002b0373/head '_' = 266 >>> -12> 2016-09-12 10:31:08.199192 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.0000002b0373/22 '_' = 259 >>> -11> 2016-09-12 10:31:08.232712 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.0000002ae882/head '_' = 266 >>> -10> 2016-09-12 10:31:08.265331 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.0000002ae882/22 '_' = 259 >>> -9> 2016-09-12 10:31:08.265456 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) error opening file >>> >>> /var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.000000b381ae__head_DB220FDD__1 >>> with flags=2: (2) No such file or directory >>> -8> 2016-09-12 10:31:08.265475 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.000000b381ae/head '_' = -2 >>> -7> 2016-09-12 10:31:08.265535 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) error opening file >>> >>> /var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.000000b381ae__21_DB220FDD__1 >>> with flags=2: (2) No such file or directory >>> -6> 2016-09-12 10:31:08.265546 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.000000b381ae/21 '_' = -2 >>> -5> 2016-09-12 10:31:08.265609 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) error opening file >>> >>> /var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.000000cf4057__head_12020FDD__1 >>> with flags=2: (2) No such file or directory >>> -4> 2016-09-12 10:31:08.265628 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/12020fdd/rb.0.392c.238e1f29.000000cf4057/head '_' = -2 >>> -3> 2016-09-12 10:31:08.265688 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) error opening file >>> >>> /var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.000000cf4057__21_12020FDD__1 >>> with flags=2: (2) No such file or directory >>> -2> 2016-09-12 10:31:08.265700 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/12020fdd/rb.0.392c.238e1f29.000000cf4057/21 '_' = -2 >>> -1> 2016-09-12 10:31:08.286313 7f8749125880 10 >>> filestore(/var/lib/ceph/osd/ceph-8) getattr >>> 1.fdd_head/1/882e0fdd/rb.0.392c.238e1f29.0000003c9802/21 '_' = 251 >>> 0> 2016-09-12 10:31:08.288858 7f8749125880 -1 osd/PGLog.cc: In >>> function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t, >>> ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&, >>> PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, >>> std::set<std::basic_string<char> >*)' thread 7f8749125880 time >>> 2016-09-12 10:31:08.286337 >>> osd/PGLog.cc: 984: FAILED assert(oi.version == i->first) >>> >>> ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90) >>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x76) [0xc0f196] >>> 2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, >>> pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, >>> std::allocator<std::pair<eversion_t const, hobject_t> > >&, >>> PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char, >>> std::char_traits<char>, std::allocator<char> >&, std::set<std::string, >>> std::less<std::string>, std::allocator<std::string> >*)+0x11ab) >>> [0x76f9ab] >>> 3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x1e2) [0x7f6c72] >>> 4: (OSD::load_pgs()+0xac0) [0x6abd00] >>> 5: (OSD::init()+0x14da) [0x6af54a] >>> 6: (main()+0x2848) [0x6339f8] >>> 7: (__libc_start_main()+0xf5) [0x7f8746443b45] >>> 8: /usr/bin/ceph-osd() [0x64d687] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> --- logging levels --- >>> 0/ 5 none >>> 0/ 1 lockdep >>> 0/ 1 context >>> 1/ 1 crush >>> 1/ 5 mds >>> 1/ 5 mds_balancer >>> 1/ 5 mds_locker >>> 1/ 5 mds_log >>> 1/ 5 mds_log_expire >>> 1/ 5 mds_migrator >>> 0/ 1 buffer >>> 0/ 1 timer >>> 0/ 1 filer >>> 0/ 1 striper >>> 0/ 1 objecter >>> 0/ 5 rados >>> 0/ 5 rbd >>> 0/ 5 rbd_replay >>> 0/ 5 journaler >>> 0/ 5 objectcacher >>> 0/ 5 client >>> 0/ 5 osd >>> 0/ 5 optracker >>> 0/ 5 objclass >>> 10/10 filestore >>> 1/ 3 keyvaluestore >>> 1/ 3 journal >>> 0/ 5 ms >>> 1/ 5 mon >>> 0/10 monc >>> 1/ 5 paxos >>> 0/ 5 tp >>> 1/ 5 auth >>> 1/ 5 crypto >>> 1/ 1 finisher >>> 1/ 5 heartbeatmap >>> 1/ 5 perfcounter >>> 1/ 5 rgw >>> 1/10 civetweb >>> 1/ 5 javaclient >>> 1/ 5 asok >>> 1/ 1 throttle >>> 0/ 0 refs >>> 1/ 5 xio >>> -2/-2 (syslog threshold) >>> -1/-1 (stderr threshold) >>> max_recent 10000 >>> max_new 1000 >>> log_file /var/log/ceph/ceph-osd.8.log >>> --- end dump of recent events --- >>> 2016-09-12 10:31:08.376098 7f8749125880 -1 *** Caught signal (Aborted) ** >>> in thread 7f8749125880 >>> >>> ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90) >>> 1: /usr/bin/ceph-osd() [0xb0c4d3] >>> 2: (()+0xf8d0) [0x7f8747fb68d0] >>> 3: (gsignal()+0x37) [0x7f8746457067] >>> 4: (abort()+0x148) [0x7f8746458448] >>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f8746d44b3d] >>> 6: (()+0x5ebb6) [0x7f8746d42bb6] >>> 7: (()+0x5ec01) [0x7f8746d42c01] >>> 8: (()+0x5ee19) [0x7f8746d42e19] >>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x247) [0xc0f367] >>> 10: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, >>> pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, >>> std::allocator<std::pair<eversion_t const, hobject_t> > >&, >>> PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char, >>> std::char_traits<char>, std::allocator<char> >&, std::set<std::string, >>> std::less<std::string>, std::allocator<std::string> >*)+0x11ab) >>> [0x76f9ab] >>> 11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x1e2) >>> [0x7f6c72] >>> 12: (OSD::load_pgs()+0xac0) [0x6abd00] >>> 13: (OSD::init()+0x14da) [0x6af54a] >>> 14: (main()+0x2848) [0x6339f8] >>> 15: (__libc_start_main()+0xf5) [0x7f8746443b45] >>> 16: /usr/bin/ceph-osd() [0x64d687] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> --- begin dump of recent events --- >>> 0> 2016-09-12 10:31:08.376098 7f8749125880 -1 *** Caught signal >>> (Aborted) ** >>> in thread 7f8749125880 >>> >>> ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90) >>> 1: /usr/bin/ceph-osd() [0xb0c4d3] >>> 2: (()+0xf8d0) [0x7f8747fb68d0] >>> 3: (gsignal()+0x37) [0x7f8746457067] >>> 4: (abort()+0x148) [0x7f8746458448] >>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f8746d44b3d] >>> 6: (()+0x5ebb6) [0x7f8746d42bb6] >>> 7: (()+0x5ec01) [0x7f8746d42c01] >>> 8: (()+0x5ee19) [0x7f8746d42e19] >>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x247) [0xc0f367] >>> 10: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, >>> pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, >>> std::allocator<std::pair<eversion_t const, hobject_t> > >&, >>> PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char, >>> std::char_traits<char>, std::allocator<char> >&, std::set<std::string, >>> std::less<std::string>, std::allocator<std::string> >*)+0x11ab) >>> [0x76f9ab] >>> 11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x1e2) >>> [0x7f6c72] >>> 12: (OSD::load_pgs()+0xac0) [0x6abd00] >>> 13: (OSD::init()+0x14da) [0x6af54a] >>> 14: (main()+0x2848) [0x6339f8] >>> 15: (__libc_start_main()+0xf5) [0x7f8746443b45] >>> 16: /usr/bin/ceph-osd() [0x64d687] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> --- logging levels --- >>> 0/ 5 none >>> 0/ 1 lockdep >>> 0/ 1 context >>> 1/ 1 crush >>> 1/ 5 mds >>> 1/ 5 mds_balancer >>> 1/ 5 mds_locker >>> 1/ 5 mds_log >>> 1/ 5 mds_log_expire >>> 1/ 5 mds_migrator >>> 0/ 1 buffer >>> 0/ 1 timer >>> 0/ 1 filer >>> 0/ 1 striper >>> 0/ 1 objecter >>> 0/ 5 rados >>> 0/ 5 rbd >>> 0/ 5 rbd_replay >>> 0/ 5 journaler >>> 0/ 5 objectcacher >>> 0/ 5 client >>> 0/ 5 osd >>> 0/ 5 optracker >>> 0/ 5 objclass >>> 10/10 filestore >>> 1/ 3 keyvaluestore >>> 1/ 3 journal >>> 0/ 5 ms >>> 1/ 5 mon >>> 0/10 monc >>> 1/ 5 paxos >>> 0/ 5 tp >>> 1/ 5 auth >>> 1/ 5 crypto >>> 1/ 1 finisher >>> 1/ 5 heartbeatmap >>> 1/ 5 perfcounter >>> 1/ 5 rgw >>> 1/10 civetweb >>> 1/ 5 javaclient >>> 1/ 5 asok >>> 1/ 1 throttle >>> 0/ 0 refs >>> 1/ 5 xio >>> -2/-2 (syslog threshold) >>> -1/-1 (stderr threshold) >>> max_recent 10000 >>> max_new 1000 >>> log_file /var/log/ceph/ceph-osd.8.log >>> --- end dump of recent events --- >>> root@ceph-osd5:~# >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com