Luminous OSD startup errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After upgrading to the latest Luminous RC (12.1.3), all our OSD's are crashing with the following assert:

     0> 2017-08-15 08:28:49.479238 7f9b7615cd00 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.3/rpm/el7/BUILD/ceph-12.1.3/src/osd/PGLog.h: In function 'static void PGLog::read_log_and_missing(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, PGLog::IndexedLog&, missing_type&, bool, std::ostringstream&, bool, bool*, const DoutPrefixProvider*, std::set<std::basic_string<char> >*, bool) [with missing_type = pg_missing_set<true>; std::ostringstream = std::basic_ostringstream<char>]' thread 7f9b7615cd00 time 2017-08-15 08:28:49.477367
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.3/rpm/el7/BUILD/ceph-12.1.3/src/osd/PGLog.h: 1301: FAILED assert(force_rebuild_missing)

 ceph version 12.1.3 (c56d9c07b342c08419bbc18dcf2a4c5fae62b9cf) luminous (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55d0f2be3b50]
 2: (void PGLog::read_log_and_missing<pg_missing_set<true> >(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, PGLog::IndexedLog&, pg_missing_set<true>&, bool, std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, bool, bool*, DoutPrefixProvider const*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, bool)+0x773) [0x55d0f276f013]
 3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x52b) [0x55d0f272739b]
 4: (OSD::load_pgs()+0x97a) [0x55d0f2673dea]
 5: (OSD::init()+0x2179) [0x55d0f268c319]
 6: (main()+0x2def) [0x55d0f2591ccf]
 7: (__libc_start_main()+0xf5) [0x7f9b727d6b35]
 8: (()+0x4ac006) [0x55d0f2630006]

Looking at the code in PGLog.h, the change from 12.1.2 to 12.1.3 (in read_log_missing) was:

        if (p->key() == "divergent_priors") {
          ::decode(divergent_priors, bp);
          ldpp_dout(dpp, 20) << "read_log_and_missing " << divergent_priors.size()
                             << " divergent_priors" << dendl;
          has_divergent_priors = true;
          debug_verify_stored_missing = false;

to

        if (p->key() == "divergent_priors") {
          ::decode(divergent_priors, bp);
          ldpp_dout(dpp, 20) << "read_log_and_missing " << divergent_priors.size()
                             << " divergent_priors" << dendl;
          assert(force_rebuild_missing);
          debug_verify_stored_missing = false;

and it seems like force_rebuild_missing is not being set.

This cluster was upgraded from Jewel to 12.1.1, then 12.1.2 and now 12.1.3.  So it seems something didn't happen correctly during the upgrade.  Any ideas how to fix it?

Andras

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux