We trashed one OSD and started backfilling it. After about 90 minutes it started crashing again: We'll disable snap trimming so it at least runs, but could someone suggest what the root cause is? Can OSD get backfilledl inconsistently from another (possibly also bad?) OSD? But then I would expect both OSDs to crash at the same time when trimming the same snap? 1111 2015-09-25 14:06:44.195039 2c9b65ed700 -1 osd.14 pg_epoch: 1216281 pg[4.3d85( v 1216271'13915838 (1209157'13912606,1216271'13915838] local-les=1216278 n=241 ec=3 les/c 1216278/1216278 1215709/1216277/1216277) [14,36,59] r= 0 lpr=1216277 mlcod 0'0 active+clean snaptrimq=[857c0~1,857c3~1,857cc~1,857d0~1,857d3~1,857d8~1,857da~1,857dd~1,857e2~1,857e9~1,857ed~1,857f1~1,857f4~1,857fc~1,857fe~1,85800~1,85803~1,8580a~1,8580c~1,8580e~1,85811~1,85813~ 1,85817~1,8581a~1,8581c~1,8581f~1,85821~1,85823~1,8582a~1,8582c~1,8582f~1,85838~1,8583a~1,8583d~1,85843~1,85848~1,8584b~1,8584d~1,85851~1,85854~1,85856~1,85859~1,8585b~1,8585d~1,85862~1,85a2f~2,85a34~1,85a43~16,85a5a~16,85 a71~1,85a73~2,85a76~6,85a7d~1,85a81~1,85a86~2,85a8a~1,85a8c~2,85a8f~2,85a93~f,85aa4~1,85aa9~2,85aac~1,85ab2~1,85ab5~3,85abc~1,85ac1~2]] trim_objectcould not find coid 783dbd85/rbd_data.1a785181f15746a.00000000000238df/857c 0//4 1112 2015-09-25 14:06:44.266974 2c9b65ed700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 2c9b65ed700 time 2015-09-25 14:06:44.206977 1113 osd/ReplicatedPG.cc: 1510: FAILED assert(0) 1114 1115 ceph version 0.67.11-82-ge5b6eea (e5b6eea91cc37434f78a987d2dd1d3edd4a23f3f) 1116 1: (ReplicatedPG::trim_object(hobject_t const&)+0x150) [0x6e8bd0] 1117 2: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x2e7) [0x6ee0d7] 1118 3: (boost::statechart::detail::reaction_result boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::n a, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost: :mpl::list<boost::statechart::custom_reaction<ReplicatedPG::SnapTrim>, boost::statechart::transition<ReplicatedPG::Reset, ReplicatedPG::NotTrimming, boost::statechart::detail::no_context<ReplicatedPG::Reset>, &(boost::stat echart::detail::no_context<ReplicatedPG::Reset>::no_function(ReplicatedPG::Reset const&))>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_ ::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<ReplicatedPG::T rimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mp l_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x96) [0x740fa6] 1119 4: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x137) [0x71bdf7] 1120 5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x26) [0x 71cfe6] 1121 6: (ReplicatedPG::snap_trimmer()+0x4ed) [0x6b59ad] 1122 7: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x790c54] 1123 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0x9a69cc] 1124 9: (ThreadPool::WorkThread::entry()+0x10) [0x9a7c20] 1125 10: (()+0x7e9a) [0x2ca012a8e9a] 1126 11: (clone()+0x6d) [0x2c9ff81d38d] 1127 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 1128 1129 --- begin dump of recent events --- 1130 -47> 2015-09-25 13:30:40.023697 2ca018cd780 5 asok(0x3005650) register_command perfcounters_dump hook 0x3006d90 1131 -46> 2015-09-25 13:30:40.023726 2ca018cd780 5 asok(0x3005650) register_command 1 hook 0x3006d90 1132 -45> 2015-09-25 13:30:40.023731 2ca018cd780 5 asok(0x3005650) register_command perf dump hook 0x3006d90 1133 -44> 2015-09-25 13:30:40.023743 2ca018cd780 5 asok(0x3005650) register_command perfcounters_schema hook 0x3006d90 1134 -43> 2015-09-25 13:30:40.023749 2ca018cd780 5 asok(0x3005650) register_command 2 hook 0x3006d90 1135 -42> 2015-09-25 13:30:40.023751 2ca018cd780 5 asok(0x3005650) register_command perf schema hook 0x3006d90 1136 -41> 2015-09-25 13:30:40.023756 2ca018cd780 5 asok(0x3005650) register_command config show hook 0x3006d90 1137 -40> 2015-09-25 13:30:40.023763 2ca018cd780 5 asok(0x3005650) register_command config set hook 0x3006d90 1138 -39> 2015-09-25 13:30:40.023766 2ca018cd780 5 asok(0x3005650) register_command config get hook 0x3006d90 1139 -38> 2015-09-25 13:30:40.023770 2ca018cd780 5 asok(0x3005650) register_command log flush hook 0x3006d90 1140 -37> 2015-09-25 13:30:40.023773 2ca018cd780 5 asok(0x3005650) register_command log dump hook 0x3006d90 1141 -36> 2015-09-25 13:30:40.023777 2ca018cd780 5 asok(0x3005650) register_command log reopen hook 0x3006d90 1142 -35> 2015-09-25 13:30:40.025429 2ca018cd780 0 ceph version 0.67.11-82-ge5b6eea (e5b6eea91cc37434f78a987d2dd1d3edd4a23f3f), process ceph-osd, pid 3251242
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com