Re: problem starting osd ; PGLog.cc: 984: FAILED assert hammer 0.94.9

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you are aware of the problematic PGs and they are exportable, then
ceph-objectstore-tool is a viable solution. If not, then running gdb
and/or higher debug osd level logs may prove useful (to understand
more about the problem or collect info to ask for more in ceph-devel).

On 13 September 2016 at 17:26, Henrik Korkuc <lists@xxxxxxxxx> wrote:
> On 16-09-13 11:13, Ronny Aasen wrote:
>>
>> I suspect this must be a difficult question since there have been no
>> replies on irc or mailinglist.
>>
>> assuming it's impossible to get these osd's running again.
>>
>> Is there a way to recover objects from the disks. ? they are mounted and
>> data is readable. I have pg's down since they want to probe these osd's that
>> do not want to start.
>>
>> pg query claim it can continue if i mark the osd as lost. but i would
>> prefer to not loose data. especially since the data is ok and readable on
>> the nonfunctioning osd.
>>
>> also let me know if there is other debug i can extract in order to
>> troubleshoot the non starting osd's
>>
>> kind regards
>> Ronny Aasen
>>
>>
> I cannot help you with this, but you can try using
> http://ceph.com/community/incomplete-pgs-oh-my/ and
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000238.html
> (found this mail thread googling for the objectool post). ymmv
>
>
>>
>>
>>
>> On 12. sep. 2016 13:16, Ronny Aasen wrote:
>>>
>>> after adding more osd's and having a big backfill running 2 of my osd's
>>> keep on stopping.
>>>
>>> We also recently upgraded from 0.94.7 to 0.94.9 but i do not know if
>>> that is related.
>>>
>>> the log say.
>>>
>>>      0> 2016-09-12 10:31:08.288858 7f8749125880 -1 osd/PGLog.cc: In
>>> function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t,
>>> ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&,
>>> PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
>>> std::set<std::basic_string<char> >*)' thread 7f8749125880 time
>>> 2016-09-12 10:31:08.286337
>>> osd/PGLog.cc: 984: FAILED assert(oi.version == i->first)
>>>
>>> googeling led me to a bug that seems to be related to infernalis only.
>>> dmesg does not show anything wrong with the hardware.
>>>
>>> this is debian running hammer 0.94.9
>>> and the osd is a software raid5 consisting of 5 3TB harddrives.
>>> journal is a partition on ssd intel 3500
>>>
>>> anyone have a clue to what can be wrong ?
>>>
>>> kind regrads
>>> Ronny Aasen
>>>
>>>
>>>
>>>
>>>
>>> -- log debug_filestore=10 --
>>>    -19> 2016-09-12 10:31:08.070947 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/1df4bfdd/rb.0.392c.238e1f29.0000002bd134/head '_' = 266
>>>     -18> 2016-09-12 10:31:08.083111 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/deb5bfdd/rb.0.392c.238e1f29.0000002bc596/head '_' = 266
>>>     -17> 2016-09-12 10:31:08.096718 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/9be5dfdd/rb.0.392c.238e1f29.0000002bc2bf/head '_' = 266
>>>     -16> 2016-09-12 10:31:08.110048 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/cbf8ffdd/rb.0.392c.238e1f29.0000002b9d89/head '_' = 266
>>>     -15> 2016-09-12 10:31:08.126263 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.0000002b078e/head '_' = 266
>>>     -14> 2016-09-12 10:31:08.150199 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.0000002b078e/22 '_' = 259
>>>     -13> 2016-09-12 10:31:08.173223 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.0000002b0373/head '_' = 266
>>>     -12> 2016-09-12 10:31:08.199192 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.0000002b0373/22 '_' = 259
>>>     -11> 2016-09-12 10:31:08.232712 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.0000002ae882/head '_' = 266
>>>     -10> 2016-09-12 10:31:08.265331 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.0000002ae882/22 '_' = 259
>>>      -9> 2016-09-12 10:31:08.265456 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) error opening file
>>>
>>> /var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.000000b381ae__head_DB220FDD__1
>>> with flags=2: (2) No such file or directory
>>>      -8> 2016-09-12 10:31:08.265475 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.000000b381ae/head '_' = -2
>>>      -7> 2016-09-12 10:31:08.265535 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) error opening file
>>>
>>> /var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.000000b381ae__21_DB220FDD__1
>>> with flags=2: (2) No such file or directory
>>>      -6> 2016-09-12 10:31:08.265546 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.000000b381ae/21 '_' = -2
>>>      -5> 2016-09-12 10:31:08.265609 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) error opening file
>>>
>>> /var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.000000cf4057__head_12020FDD__1
>>> with flags=2: (2) No such file or directory
>>>      -4> 2016-09-12 10:31:08.265628 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/12020fdd/rb.0.392c.238e1f29.000000cf4057/head '_' = -2
>>>      -3> 2016-09-12 10:31:08.265688 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) error opening file
>>>
>>> /var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.000000cf4057__21_12020FDD__1
>>> with flags=2: (2) No such file or directory
>>>      -2> 2016-09-12 10:31:08.265700 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/12020fdd/rb.0.392c.238e1f29.000000cf4057/21 '_' = -2
>>>      -1> 2016-09-12 10:31:08.286313 7f8749125880 10
>>> filestore(/var/lib/ceph/osd/ceph-8) getattr
>>> 1.fdd_head/1/882e0fdd/rb.0.392c.238e1f29.0000003c9802/21 '_' = 251
>>>       0> 2016-09-12 10:31:08.288858 7f8749125880 -1 osd/PGLog.cc: In
>>> function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t,
>>> ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&,
>>> PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
>>> std::set<std::basic_string<char> >*)' thread 7f8749125880 time
>>> 2016-09-12 10:31:08.286337
>>> osd/PGLog.cc: 984: FAILED assert(oi.version == i->first)
>>>
>>>   ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
>>>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x76) [0xc0f196]
>>>   2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t,
>>> pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>,
>>> std::allocator<std::pair<eversion_t const, hobject_t> > >&,
>>> PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char,
>>> std::char_traits<char>, std::allocator<char> >&, std::set<std::string,
>>> std::less<std::string>, std::allocator<std::string> >*)+0x11ab)
>>> [0x76f9ab]
>>>   3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x1e2) [0x7f6c72]
>>>   4: (OSD::load_pgs()+0xac0) [0x6abd00]
>>>   5: (OSD::init()+0x14da) [0x6af54a]
>>>   6: (main()+0x2848) [0x6339f8]
>>>   7: (__libc_start_main()+0xf5) [0x7f8746443b45]
>>>   8: /usr/bin/ceph-osd() [0x64d687]
>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>> --- logging levels ---
>>>     0/ 5 none
>>>     0/ 1 lockdep
>>>     0/ 1 context
>>>     1/ 1 crush
>>>     1/ 5 mds
>>>     1/ 5 mds_balancer
>>>     1/ 5 mds_locker
>>>     1/ 5 mds_log
>>>     1/ 5 mds_log_expire
>>>     1/ 5 mds_migrator
>>>     0/ 1 buffer
>>>     0/ 1 timer
>>>     0/ 1 filer
>>>     0/ 1 striper
>>>     0/ 1 objecter
>>>     0/ 5 rados
>>>     0/ 5 rbd
>>>     0/ 5 rbd_replay
>>>     0/ 5 journaler
>>>     0/ 5 objectcacher
>>>     0/ 5 client
>>>     0/ 5 osd
>>>     0/ 5 optracker
>>>     0/ 5 objclass
>>>    10/10 filestore
>>>     1/ 3 keyvaluestore
>>>     1/ 3 journal
>>>     0/ 5 ms
>>>     1/ 5 mon
>>>     0/10 monc
>>>     1/ 5 paxos
>>>     0/ 5 tp
>>>     1/ 5 auth
>>>     1/ 5 crypto
>>>     1/ 1 finisher
>>>     1/ 5 heartbeatmap
>>>     1/ 5 perfcounter
>>>     1/ 5 rgw
>>>     1/10 civetweb
>>>     1/ 5 javaclient
>>>     1/ 5 asok
>>>     1/ 1 throttle
>>>     0/ 0 refs
>>>     1/ 5 xio
>>>    -2/-2 (syslog threshold)
>>>    -1/-1 (stderr threshold)
>>>    max_recent     10000
>>>    max_new         1000
>>>    log_file /var/log/ceph/ceph-osd.8.log
>>> --- end dump of recent events ---
>>> 2016-09-12 10:31:08.376098 7f8749125880 -1 *** Caught signal (Aborted) **
>>>   in thread 7f8749125880
>>>
>>>   ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
>>>   1: /usr/bin/ceph-osd() [0xb0c4d3]
>>>   2: (()+0xf8d0) [0x7f8747fb68d0]
>>>   3: (gsignal()+0x37) [0x7f8746457067]
>>>   4: (abort()+0x148) [0x7f8746458448]
>>>   5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f8746d44b3d]
>>>   6: (()+0x5ebb6) [0x7f8746d42bb6]
>>>   7: (()+0x5ec01) [0x7f8746d42c01]
>>>   8: (()+0x5ee19) [0x7f8746d42e19]
>>>   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x247) [0xc0f367]
>>>   10: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t,
>>> pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>,
>>> std::allocator<std::pair<eversion_t const, hobject_t> > >&,
>>> PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char,
>>> std::char_traits<char>, std::allocator<char> >&, std::set<std::string,
>>> std::less<std::string>, std::allocator<std::string> >*)+0x11ab)
>>> [0x76f9ab]
>>>   11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x1e2)
>>> [0x7f6c72]
>>>   12: (OSD::load_pgs()+0xac0) [0x6abd00]
>>>   13: (OSD::init()+0x14da) [0x6af54a]
>>>   14: (main()+0x2848) [0x6339f8]
>>>   15: (__libc_start_main()+0xf5) [0x7f8746443b45]
>>>   16: /usr/bin/ceph-osd() [0x64d687]
>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>> --- begin dump of recent events ---
>>>       0> 2016-09-12 10:31:08.376098 7f8749125880 -1 *** Caught signal
>>> (Aborted) **
>>>   in thread 7f8749125880
>>>
>>>   ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
>>>   1: /usr/bin/ceph-osd() [0xb0c4d3]
>>>   2: (()+0xf8d0) [0x7f8747fb68d0]
>>>   3: (gsignal()+0x37) [0x7f8746457067]
>>>   4: (abort()+0x148) [0x7f8746458448]
>>>   5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f8746d44b3d]
>>>   6: (()+0x5ebb6) [0x7f8746d42bb6]
>>>   7: (()+0x5ec01) [0x7f8746d42c01]
>>>   8: (()+0x5ee19) [0x7f8746d42e19]
>>>   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x247) [0xc0f367]
>>>   10: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t,
>>> pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>,
>>> std::allocator<std::pair<eversion_t const, hobject_t> > >&,
>>> PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char,
>>> std::char_traits<char>, std::allocator<char> >&, std::set<std::string,
>>> std::less<std::string>, std::allocator<std::string> >*)+0x11ab)
>>> [0x76f9ab]
>>>   11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x1e2)
>>> [0x7f6c72]
>>>   12: (OSD::load_pgs()+0xac0) [0x6abd00]
>>>   13: (OSD::init()+0x14da) [0x6af54a]
>>>   14: (main()+0x2848) [0x6339f8]
>>>   15: (__libc_start_main()+0xf5) [0x7f8746443b45]
>>>   16: /usr/bin/ceph-osd() [0x64d687]
>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>> --- logging levels ---
>>>     0/ 5 none
>>>     0/ 1 lockdep
>>>     0/ 1 context
>>>     1/ 1 crush
>>>     1/ 5 mds
>>>     1/ 5 mds_balancer
>>>     1/ 5 mds_locker
>>>     1/ 5 mds_log
>>>     1/ 5 mds_log_expire
>>>     1/ 5 mds_migrator
>>>     0/ 1 buffer
>>>     0/ 1 timer
>>>     0/ 1 filer
>>>     0/ 1 striper
>>>     0/ 1 objecter
>>>     0/ 5 rados
>>>     0/ 5 rbd
>>>     0/ 5 rbd_replay
>>>     0/ 5 journaler
>>>     0/ 5 objectcacher
>>>     0/ 5 client
>>>     0/ 5 osd
>>>     0/ 5 optracker
>>>     0/ 5 objclass
>>>    10/10 filestore
>>>     1/ 3 keyvaluestore
>>>     1/ 3 journal
>>>     0/ 5 ms
>>>     1/ 5 mon
>>>     0/10 monc
>>>     1/ 5 paxos
>>>     0/ 5 tp
>>>     1/ 5 auth
>>>     1/ 5 crypto
>>>     1/ 1 finisher
>>>     1/ 5 heartbeatmap
>>>     1/ 5 perfcounter
>>>     1/ 5 rgw
>>>     1/10 civetweb
>>>     1/ 5 javaclient
>>>     1/ 5 asok
>>>     1/ 1 throttle
>>>     0/ 0 refs
>>>     1/ 5 xio
>>>    -2/-2 (syslog threshold)
>>>    -1/-1 (stderr threshold)
>>>    max_recent     10000
>>>    max_new         1000
>>>    log_file /var/log/ceph/ceph-osd.8.log
>>> --- end dump of recent events ---
>>> root@ceph-osd5:~#
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux