Hello all, We had an inconsistent PG on our cluster. While performing PG repair operation the OSD crashed. The OSD was not able to start again anymore, and there was no hardware failure on the disk itself. This is the log output 2017-10-17 17:48:55.771384 7f234930d700 -1 log_channel(cluster) log [ERR] : 2.2fc repair 1 missing, 0 inconsistent objects 2017-10-17 17:48:55.771417 7f234930d700 -1 log_channel(cluster) log [ERR] : 2.2fc repair 3 errors, 1 fixed 2017-10-17 17:48:56.047896 7f234930d700 -1 /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc: In function 'virtual void PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f234930d700 time 2017-10-17 17:48:55.924115 /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end()) ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x56236c8ff3f2] 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore::Transaction*)+0xd63) [0x56236c476213] 3: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp const&, PullOp*, std::__cxx11::list<ReplicatedBackend::pull_complete_info, std::allocator<ReplicatedBackend::pull_complete_info> >*, ObjectStore::Transaction*)+0x693) [0x56236c60d4d3] 4: (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x2b5) [0x56236c60dd75] 5: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x20c) [0x56236c61196c] 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x56236c521aa0] 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x55d) [0x56236c48662d] 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) [0x56236c3091a9] 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x56236c5a2ae7] 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x130e) [0x56236c3307de] 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x56236c9041e4] 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x56236c907220] 13: (()+0x76ba) [0x7f2366be96ba] 14: (clone()+0x6d) [0x7f2365c603dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Thanks! Ana _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com