Hi, I have two OSD's which are failing with an assert which looks related to missing objects. This happened after a large RBD snapshot was deleted causing several OSD's to start flapping as they experienced high load. Cluster is fully recovered and I don't need any help from a recovery perspective. I'm happy to Zap and recreate OSD's, which I will probably do in a couple of days time. Or if anybody looks at the error and see's an easy way to get the OSD to start up, then bonus!!! However, I thought I would post in case there is any interest in trying to diagnose why this happened. There was no power or networking issues and no hard reboot's, so this is purely contained within the Ceph OSD process. The objects that it claims are missing are from the RBD that had the snapshot deleted. I'm guessing that the last command before the OSD died at some point was to delete those two objects which did actually happen, but for some reason the OSD had died before it got confirmation??? And now it's trying to delete them, but they don't exist. I have the full debug 20 log, but pretty much all the lines above the below snippet just have it deleting thousands of objects without any problems. Nick -4> 2016-11-15 09:46:52.061643 7f728f9368c0 20 read_log 6 divergent_priors -3> 2016-11-15 09:46:52.061779 7f728f9368c0 10 read_log checking for missing items over interval (0'0,1607344'260104] -2> 2016-11-15 09:46:52.069987 7f728f9368c0 15 read_log missing 1553246'255377,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:head -1> 2016-11-15 09:46:52.070007 7f728f9368c0 15 read_log missing 1553190'255366,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:6c 0> 2016-11-15 09:46:52.071471 7f728f9368c0 -1 osd/PGLog.cc: In function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&, PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, const DoutPrefixProvider*, std::set<std::__cxx11::basic_string<char> >*)' thread 7f728f9368c0 time 2016-11-15 09:46:52.070023 osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first) ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x5642d2734ea0] 2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, DoutPrefixProvider const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0x719) [0x5642d22e2fd9] 3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2f6) [0x5642d21172d6] 4: (OSD::load_pgs()+0x87d) [0x5642d205345d] 5: (OSD::init()+0x2026) [0x5642d205e7a6] 6: (main()+0x2ea5) [0x5642d1fd08f5] 7: (__libc_start_main()+0xf0) [0x7f728c77c830] 8: (_start()+0x29) [0x5642d2011f89] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com