Re: problem starting osd ; PGLog.cc: 984: FAILED assert hammer 0.94.9

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



added debug journal = 20 and got some new lines in the log. that i added to the end of this email.

any of you can make something out of them ?

kind regards
Ronny Aasen



On 18.09.2016 18:59, Kostis Fardelas wrote:
If you are aware of the problematic PGs and they are exportable, then
ceph-objectstore-tool is a viable solution. If not, then running gdb
and/or higher debug osd level logs may prove useful (to understand
more about the problem or collect info to ask for more in ceph-devel).

On 13 September 2016 at 17:26, Henrik Korkuc <lists@xxxxxxxxx> wrote:
On 16-09-13 11:13, Ronny Aasen wrote:
I suspect this must be a difficult question since there have been no
replies on irc or mailinglist.

assuming it's impossible to get these osd's running again.

Is there a way to recover objects from the disks. ? they are mounted and
data is readable. I have pg's down since they want to probe these osd's that
do not want to start.

pg query claim it can continue if i mark the osd as lost. but i would
prefer to not loose data. especially since the data is ok and readable on
the nonfunctioning osd.

also let me know if there is other debug i can extract in order to
troubleshoot the non starting osd's

kind regards
Ronny Aasen


I cannot help you with this, but you can try using
http://ceph.com/community/incomplete-pgs-oh-my/ and
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000238.html
(found this mail thread googling for the objectool post). ymmv




On 12. sep. 2016 13:16, Ronny Aasen wrote:
after adding more osd's and having a big backfill running 2 of my osd's
keep on stopping.

We also recently upgraded from 0.94.7 to 0.94.9 but i do not know if
that is related.

the log say.


[snip old error log. ]

-17> 2016-09-18 22:52:06.405881 7f878791b880 10 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/578c53b6/rb.0.392c.238e1f29.0000000513d5/head '_' = 266 -16> 2016-09-18 22:52:06.405915 7f878791b880 15 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/578c53b6/rb.0.392c.238e1f29.0000000513d5/21 '_' -15> 2016-09-18 22:52:06.406049 7f878791b880 10 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/578c53b6/rb.0.392c.238e1f29.0000000513d5/21 '_' = 251 -14> 2016-09-18 22:52:06.406079 7f878791b880 15 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/4ecf13b6/rb.0.392c.238e1f29.00000037c4cb/21 '_' -13> 2016-09-18 22:52:06.406166 7f878791b880 10 filestore(/var/lib/ceph/osd/ceph-106) error opening file /var/lib/ceph/osd/ceph-106/current/1.3b6_head/DIR_6/DIR_B/DIR_3/DIR_1/DIR_F/rb.0.392c.238e1f29.00000037c4c b__21_4ECF13B6__1 with flags=2: (2) No such file or directory -12> 2016-09-18 22:52:06.406187 7f878791b880 10 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/4ecf13b6/rb.0.392c.238e1f29.00000037c4cb/21 '_' = -2 -11> 2016-09-18 22:52:06.406190 7f878791b880 15 read_log missing 104661'46956,1/4ecf13b6/rb.0.392c.238e 1f29.00000037c4cb/21 -10> 2016-09-18 22:52:06.406195 7f878791b880 15 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/e85f13b6/rb.0.392c.238e1f29.000000b5bb3b/head '_' -9> 2016-09-18 22:52:06.406279 7f878791b880 10 filestore(/var/lib/ceph/osd/ceph-106) error opening file /var/lib/ceph/osd/ceph-106/current/1.3b6_head/DIR_6/DIR_B/DIR_3/DIR_1/DIR_F/rb.0.392c.238e1f29.000000b5bb3 b__head_E85F13B6__1 with flags=2: (2) No such file or directory -8> 2016-09-18 22:52:06.406293 7f878791b880 10 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/e85f13b6/rb.0.392c.238e1f29.000000b5bb3b/head '_' = -2 -7> 2016-09-18 22:52:06.406297 7f878791b880 15 read_log missing 104661'46955,1/e85f13b6/rb.0.392c.238e 1f29.000000b5bb3b/head -6> 2016-09-18 22:52:06.406311 7f878791b880 15 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/e85f13b6/rb.0.392c.238e1f29.000000b5bb3b/21 '_' -5> 2016-09-18 22:52:06.406363 7f878791b880 10 filestore(/var/lib/ceph/osd/ceph-106) error opening file /var/lib/ceph/osd/ceph-106/current/1.3b6_head/DIR_6/DIR_B/DIR_3/DIR_1/DIR_F/rb.0.392c.238e1f29.000000b5bb3 b__21_E85F13B6__1 with flags=2: (2) No such file or directory -4> 2016-09-18 22:52:06.406369 7f878791b880 10 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/e85f13b6/rb.0.392c.238e1f29.000000b5bb3b/21 '_' = -2 -3> 2016-09-18 22:52:06.406372 7f878791b880 15 read_log missing 91332'39092,1/e85f13b6/rb.0.392c.238e1 f29.000000b5bb3b/21 -2> 2016-09-18 22:52:06.406375 7f878791b880 15 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/d9c303b6/rb.0.392c.238e1f29.000000004943/head '_' -1> 2016-09-18 22:52:06.426875 7f878791b880 10 filestore(/var/lib/ceph/osd/ceph-106) getattr 1.3b6_head /1/d9c303b6/rb.0.392c.238e1f29.000000004943/head '_' = 266 0> 2016-09-18 22:52:06.455911 7f878791b880 -1 osd/PGLog.cc: In function 'static void PGLog::read_log(O bjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&, PGLog::Indexed Log&, pg_missing_t&, std::ostringstream&, std::set<std::basic_string<char> >*)' thread 7f878791b880 time 20 16-09-18 22:52:06.426909
osd/PGLog.cc: 984: FAILED assert(oi.version == i->first)

 ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xc0f196] 2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobje ct_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog& , pg_missing_t&, std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, std::set<s td::string, std::less<std::string>, std::allocator<std::string> >*)+0x11ab) [0x76f9ab]
 3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x1e2) [0x7f6c72]
 4: (OSD::load_pgs()+0xac0) [0x6abd00]
 5: (OSD::init()+0x14da) [0x6af54a]
 6: (main()+0x2848) [0x6339f8]
 7: (__libc_start_main()+0xf5) [0x7f8784c3ab45]
 8: /usr/bin/ceph-osd() [0x64d687]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
  20/20 osd
   0/ 5 optracker
   0/ 5 objclass
  20/20 filestore
   1/ 3 keyvaluestore
  20/20 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.106.log
--- end dump of recent events ---
2016-09-18 22:52:06.621664 7f878791b880 -1 *** Caught signal (Aborted) **
 in thread 7f878791b880

 ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
 1: /usr/bin/ceph-osd() [0xb0c4d3]
 2: (()+0xf8d0) [0x7f87867ad8d0]
 3: (gsignal()+0x37) [0x7f8784c4e067]
 4: (abort()+0x148) [0x7f8784c4f448]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f878553bb3d]
 6: (()+0x5ebb6) [0x7f8785539bb6]
 7: (()+0x5ec01) [0x7f8785539c01]
 8: (()+0x5ee19) [0x7f8785539e19]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x247) [0xc0f367] 10: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobj ect_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog &, pg_missing_t&, std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, std::set< std::string, std::less<std::string>, std::allocator<std::string> >*)+0x11ab) [0x76f9ab]
 11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x1e2) [0x7f6c72]
 12: (OSD::load_pgs()+0xac0) [0x6abd00]
 13: (OSD::init()+0x14da) [0x6af54a]
 14: (main()+0x2848) [0x6339f8]
 15: (__libc_start_main()+0xf5) [0x7f8784c3ab45]
 16: /usr/bin/ceph-osd() [0x64d687]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2016-09-18 22:52:06.621664 7f878791b880 -1 *** Caught signal (Aborted) **
 in thread 7f878791b880

 ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
 1: /usr/bin/ceph-osd() [0xb0c4d3]
 2: (()+0xf8d0) [0x7f87867ad8d0]
 3: (gsignal()+0x37) [0x7f8784c4e067]
 4: (abort()+0x148) [0x7f8784c4f448]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f878553bb3d]
 6: (()+0x5ebb6) [0x7f8785539bb6]
 7: (()+0x5ec01) [0x7f8785539c01]
 8: (()+0x5ee19) [0x7f8785539e19]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x247) [0xc0f367] 10: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobj ect_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog &, pg_missing_t&, std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, std::set< std::string, std::less<std::string>, std::allocator<std::string> >*)+0x11ab) [0x76f9ab]
 11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x1e2) [0x7f6c72]
 12: (OSD::load_pgs()+0xac0) [0x6abd00]
 13: (OSD::init()+0x14da) [0x6af54a]
 14: (main()+0x2848) [0x6339f8]
 15: (__libc_start_main()+0xf5) [0x7f8784c3ab45]
 16: /usr/bin/ceph-osd() [0x64d687]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
  20/20 osd
   0/ 5 optracker
   0/ 5 objclass
  20/20 filestore
   1/ 3 keyvaluestore
  20/20 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.106.log
--- end dump of recent events ---


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux