After OSD Flap - FAILED assert(oi.version == i->first)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have two OSD's which are failing with an assert which looks related to missing objects. This happened after a large RBD snapshot
was deleted causing several OSD's to start flapping as they experienced high load. Cluster is fully recovered and I don't need any
help from a recovery perspective. I'm happy to Zap and recreate OSD's, which I will probably do in a couple of days time. Or if
anybody looks at the error and see's an easy way to get the OSD to start up, then bonus!!!

However, I thought I would post in case there is any interest in trying to diagnose why this happened. There was no power or
networking issues and no hard reboot's, so this is purely contained within the Ceph OSD process.

The objects that it claims are missing are from the RBD that had the snapshot deleted. I'm guessing that the last command before the
OSD died at some point was to delete those two objects which did actually happen, but for some reason the OSD had died before it got
confirmation??? And now it's trying to delete them, but they don't exist.

I have the full debug 20 log, but pretty much all the lines above the below snippet just have it deleting thousands of objects
without any problems.

Nick 

 -4> 2016-11-15 09:46:52.061643 7f728f9368c0 20 read_log 6 divergent_priors
    -3> 2016-11-15 09:46:52.061779 7f728f9368c0 10 read_log checking for missing items over interval (0'0,1607344'260104]
    -2> 2016-11-15 09:46:52.069987 7f728f9368c0 15 read_log  missing
1553246'255377,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:head
    -1> 2016-11-15 09:46:52.070007 7f728f9368c0 15 read_log  missing
1553190'255366,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:6c
     0> 2016-11-15 09:46:52.071471 7f728f9368c0 -1 osd/PGLog.cc: In function 'static void PGLog::read_log(ObjectStore*, coll_t,
coll_t, ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&, PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
const DoutPrefixProvider*, std::set<std::__cxx11::basic_string<char> >*)' thread 7f728f9368c0 time 2016-11-15 09:46:52.070023
osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)

 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x5642d2734ea0]
 2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobject_t,
std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&,
std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, DoutPrefixProvider const*,
std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > > >*)+0x719) [0x5642d22e2fd9]
 3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2f6) [0x5642d21172d6]
 4: (OSD::load_pgs()+0x87d) [0x5642d205345d]
 5: (OSD::init()+0x2026) [0x5642d205e7a6]
 6: (main()+0x2ea5) [0x5642d1fd08f5]
 7: (__libc_start_main()+0xf0) [0x7f728c77c830]
 8: (_start()+0x29) [0x5642d2011f89]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux