Assert when upgrading from Hammer to Jewel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We were upgrading from Ceph Hammer to Ceph Jewel, we updated our OS from CentOS 7.1 to CentOS 7.3 prior to this without issue – we ran into 2 issues:

 

  1. FAILED assert(0 == "Missing map in load_pgs")
    1. We found the following article fixed this issue:

https://www.mail-archive.com/search?l=ceph-users@xxxxxxxxxxxxxx&q=subject:%22%5C%5Bceph%5C-users%5C%5D+Bug+in+OSD+Maps%22&o=newest&f=1

 

  1. We had 3 other OSDs that went down are asserting with the following:

 

 

0> 2018-12-04 04:20:06.793803 7f375174b700 -1 osd/PGLog.cc: In function 'static void PGLog::_merge_object_divergent_entries(const PGLog::IndexedLog&, const hobject_t&, const std::list<pg_log_entry_t>&, const pg_info_t&, eversion_t, pg_missing_t&, boost::optional<std::pair<eversion_t, hobject_t> >*, PGLog::LogEntryHandler*, const DoutPrefixProvider*)' thread 7f375174b700 time 2018-12-04 04:20:06.789747

osd/PGLog.cc: 391: FAILED assert(objiter->second->version > last_divergent_update)

 

ceph version 10.2.7.aq1 (b76d08dbcee5d59ac08004fda6976b64df3ff59b)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x558c33ac9105]

2: (PGLog::_merge_object_divergent_entries(PGLog::IndexedLog const&, hobject_t const&, std::list<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, pg_info_t const&, eversion_t, pg_missing_t&, boost::optional<std::pair<eversion_t, hobject_t> >*, PGLog::LogEntryHandler*, DoutPrefixProvider const*)+0x20d4) [0x558c336b8224]

3: (PGLog::_merge_divergent_entries(PGLog::IndexedLog const&, std::list<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, pg_info_t const&, eversion_t, pg_missing_t&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >*, PGLog::LogEntryHandler*, DoutPrefixProvider const*)+0x20b) [0x558c336beb5b]

4: (PGLog::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0xdbc) [0x558c336bc4fc]

5: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0xbf) [0x558c334e9b8f]

6: (PG::RecoveryState::Stray::react(PG::MLogRec const&)+0x3d1) [0x558c33515ce1]

7: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x214) [0x558c335526e4]

8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x6b) [0x558c3353bc5b]

9: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1f4) [0x558c33502a54]

10: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x259) [0x558c3345b519]

11: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x558c334a5c82]

12: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa7e) [0x558c33aba14e]

13: (ThreadPool::WorkThread::entry()+0x10) [0x558c33abb030]

14: (()+0x7e25) [0x7f377d691e25]

15: (clone()+0x6d) [0x7f377bd1b34d]

NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

 

Everything I’ve found regarding this seems to indicate a hardware problem however these disks mount just fine, there are no errors in dmesg or /var/log/messages, and xfs_repair returns no errors.

 

Any idea on where to start troubleshooting this?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux