Am 17.05.2018 um 00:12 schrieb Gregory Farnum:Yes. The process ceph-osd for osd.130 was not runnin in that phase. Yes. pg 5.9b is active all of the time (on two other OSDs). I think OSD.19 is the primary for that pg. "ceph pg 5.9b query" thells me : ..... "up": [ 19, 166 ], "acting": [ 19, 166 ], "actingbackfill": [ "19", "166" ], .... Yes. It crashes again with the following lines in the osd log : -2> 2018-05-16 11:11:59.639980 7fe812ffd700 5 -- 10.7.2.141:6800/173031 >> 10.7.2.49:6836/3920 conn(0x5619ed76c000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=24047 cs=1 l=0). rx osd.19 seq 24 0x5619eebd6d00 pg_backfill(progress 5.9b e 505567/505567 lb 5:d97d84eb:::rbd_data.112913b238e1f29.0000000000000ba3:56c06) v3 -1> 2018-05-16 11:11:59.639995 7fe812ffd700 1 -- 10.7.2.141:6800/173031 <== osd.19 10.7.2.49:6836/3920 24 ==== pg_backfill(progress 5.9b e 505567/505567 lb 5:d97d84eb:::rbd_data.112913b238e1f29.0000000000000ba3:56c06) v3 ==== 955+0+0 (3741758263 0 0) 0x5619eebd6d00 con 0x5619ed76c000 0> 2018-05-16 11:11:59.645952 7fe7fe7eb700 -1 /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: In function 'virtual void PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7fe7fe7eb700 time 2018-05-16 11:11:59.640238 /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end()) ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5619c11b1a02] 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore::Transaction*)+0xd63) [0x5619c0d1f873] 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x2da) [0x5619c0eb15ca] 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x12e) [0x5619c0eb17fe] 5: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1) [0x5619c0ec0d71] 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5619c0dcc440] 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x543) [0x5619c0d30853] 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) [0x5619c0ba7539] 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x5619c0e50f37] 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1047) [0x5619c0bd5847] 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x5619c11b67f4] 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5619c11b9830] 13: (()+0x76ba) [0x7fe8173746ba] 14: (clone()+0x6d) [0x7fe8163eb41d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Ok. I think we should try that next. Thank you ! |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com