Dear everyone I can't start osd.555, some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.555 and, removed osd.555, but other osds (eg: osd.532 down, not start osd.532) when object recovering; I find exception message about "copy_subset size is so huge" MOSDPGPull(4.193c 295912 [PullOp(6e17193c/rbd_data.7acbea7d555b97.0000000000006b69/head//4, recovery_info: ObjectRecoveryInfo(6e17193c/rbd_data.7acbea7d555b97.0000000000006b69/head//4@293776'14739967, copy_subset: [0~18446744073709551615], clone_subset: {}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PullOp(f159993c/rbd_data.1bece115bba4049.00000000001fa2cc/head//4, recovery_info: ObjectRecoveryInfo(f159993c/rbd_data.1bece115bba4049.00000000001fa2cc/head//4@294153'14739969, copy_subset: [0~18446744073709551615], clone_subset: {}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -- ?+0 0x34a72760 con 0x330b2c00 # grep 18446744073709551615 /var/log/ceph/ceph-osd.532.log --color |wc -l 122 interval_set<uint64_t> copy_subset; maybe it is triggered by data struct flooding? Guide me to debug it, please! Thanks! relevant info below: ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) cluster ffa2090c-1bd0-4f8e-973d-b0d4ddf9c2d8 health HEALTH_WARN 680 pgs degraded; 680 pgs stuck unclean; 42 requests are blocked > 32 sec; recovery 150458/29608490 objects degraded (0.508%); 1/690 in osds are down; noout,noscrub,nodeep-scrub flag(s) set monmap e11: 5 mons at {BJ-M1-Cloud71=XXX}, election epoch 39138, quorum 0,1,2,3,4 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85 osdmap e296072: 719 osds: 689 up, 690 in flags noout,noscrub,nodeep-scrub pgmap v127916572: 71504 pgs, 8 pools, 71837 GB data, 14457 kobjects 140 TB used, 721 TB / 862 TB avail 150458/29608490 objects degraded (0.508%) 70824 active+clean 680 active+degraded client io 100 kB/s rd, 4203 kB/s wr, 652 op/s ceph.532-log: -10> 2018-08-22 12:32:26.771645 7f451d916700 5 -- op tracker -- , seq: 2846, time: 2018-08-22 12:32:26.771591, event: reached_pg, request: MOSDPGPush(4.5d2e 295912 [PushOp(bfc1dd2e/rbd_data.183b85e601d1026.0000000000000212/head//4, version: 293767'23828530, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(bfc1dd2e/rbd_data.183b85e601d1026.0000000000000212/head//4@293767'23828530, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -9> 2018-08-22 12:32:26.771672 7f451b112700 5 -- op tracker -- , seq: 2848, time: 2018-08-22 12:32:26.771645, event: reached_pg, request: MOSDPGPush(4.1fa 295912 [PushOp(66881fa/rbd_data.320a2da4111f069.00000000000074cd/head//4, version: 289249'30699865, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(66881fa/rbd_data.320a2da4111f069.00000000000074cd/head//4@289249'30699865, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -8> 2018-08-22 12:32:26.771735 7f451d916700 5 -- op tracker -- , seq: 2846, time: 2018-08-22 12:32:26.771735, event: done, request: MOSDPGPush(4.5d2e 295912 [PushOp(bfc1dd2e/rbd_data.183b85e601d1026.0000000000000212/head//4, version: 293767'23828530, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(bfc1dd2e/rbd_data.183b85e601d1026.0000000000000212/head//4@293767'23828530, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -7> 2018-08-22 12:32:26.771769 7f451ed18700 5 -- op tracker -- , seq: 2847, time: 2018-08-22 12:32:26.771740, event: reached_pg, request: MOSDPGPush(4.5d2e 295912 [PushOp(ff9f5d2e/rbd_data.183b85e601d1026.0000000000004a8d/head//4, version: 293767'23828531, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(ff9f5d2e/rbd_data.183b85e601d1026.0000000000004a8d/head//4@293767'23828531, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -6> 2018-08-22 12:32:26.771798 7f451b112700 5 -- op tracker -- , seq: 2848, time: 2018-08-22 12:32:26.771774, event: done, request: MOSDPGPush(4.1fa 295912 [PushOp(66881fa/rbd_data.320a2da4111f069.00000000000074cd/head//4, version: 289249'30699865, data_included: [0~4194304], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(66881fa/rbd_data.320a2da4111f069.00000000000074cd/head//4@289249'30699865, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -5> 2018-08-22 12:32:26.771867 7f451ed18700 5 -- op tracker -- , seq: 2847, time: 2018-08-22 12:32:26.771867, event: done, request: MOSDPGPush(4.5d2e 295912 [PushOp(ff9f5d2e/rbd_data.183b85e601d1026.0000000000004a8d/head//4, version: 293767'23828531, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(ff9f5d2e/rbd_data.183b85e601d1026.0000000000004a8d/head//4@293767'23828531, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -4> 2018-08-22 12:32:26.773676 7f4525b23700 1 -- xxx<== osd.495 XXX 12 ==== osd_ping(ping e295912 stamp 2018-08-22 12:32:26.762150) v2 ==== 47+0+0 (2232059652 0 0) 0x357ae040 con 0x34b05ee0 - 0> 2018-08-22 12:32:26.774276 7f451610a700 -1 *** Caught signal (Segmentation fault) ** in thread 7f451610a700 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) 1: /usr/bin/ceph-osd() [0x9acb51] 2: /lib64/libpthread.so.0() [0x3ee680f7e0] 3: (ReplicatedPG::trim_object(hobject_t const&)+0x2e6) [0x854196] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x73c) [0x856b5c] 5: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xa8) [0x8b4f38] 6: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x12f) [0x8a3ebf] 7: (ReplicatedPG::snap_trimmer()+0x5b0) [0x80f6b0] 8: (OSD::SnapTrimWQ::_process(PG*)+0x1d) [0x664dbd] 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x9beac1] 10: (ThreadPool::WorkThread::entry()+0x10) [0x9c1b00] 11: /lib64/libpthread.so.0() [0x3ee6807aa1] 12: (clone()+0x6d) [0x3ee5ce893d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.