Hi Sage, Am 24.01.2018 um 01:07 schrieb Sage Weil: > On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote: >> >> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>> Hi Sage, >>>> >>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>> telling me: >>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>> on pg 3.33e oid >>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>> mapper: , oi: b0d7c...repaired >>>> >>>> Currently it seems it does an endless repair and never ends. >>> >>> On many objects or the same object? >> >> OK no it still does endless repair but the reason is it also repairs >> "newly" created snapshots. >> >> Example: >> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error >> on pg 3.c70 oid >> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps >> missing in mapper, should be: b0d7c...repaired >> >> rbd image 'vm-251-disk-1': >> size 170 GB in 43520 objects >> order 22 (4096 kB objects) >> block_name_prefix: rbd_data.52ab946b8b4567 >> format: 2 >> features: layering >> flags: >> SNAPID NAME SIZE TIMESTAMP >> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018 >> >> So this is a fresh snapshot which needs a repair? > > Aha! That's a helpful clue. Is there any chance you can crank up teh > osd debug log on one OSD can catch one of these cases? (We want to > capture the clone being created, and also the scrub on the pg that follows > and sees the mismatch). You mean debug_osd? How high must it be? And do we really need all log messages or might it be enough to change some of them to 0 in the source code - so they always log? Greets, Stefan > Thanks! > sage > > >> Greets, >> Stefan >> >>> >>> sage >>> >>>> >>>> Greets, >>>> Stefan >>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Sage, >>>>>> >>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>> it's OK to skip the assert. >>>>>> >>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>> needs to get killed to proceed. >>>>> >>>>> I believe this will fix it: >>>>> >>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>> >>>>> Cherry-pick that and let me know? >>>>> >>>>> Thanks! >>>>> sage >>>>> >>>>> >>>>>> >>>>>> The log output is always: >>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>> (Aborted) ** >>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>> >>>>>> ceph version 12.2.2-94-g92923ef >>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>> [0x561d87138bc7] >>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>> const&)+0x57) [0x561d873b0db7] >>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>> [0x561d876f42bd] >>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>> needed to interpret this. >>>>>> >>>>>> --- logging levels --- >>>>>> 0/ 5 none >>>>>> 0/ 0 lockdep >>>>>> 0/ 0 context >>>>>> 0/ 0 crush >>>>>> 0/ 0 mds >>>>>> 0/ 0 mds_balancer >>>>>> 0/ 0 mds_locker >>>>>> 0/ 0 mds_log >>>>>> 0/ 0 mds_log_expire >>>>>> 0/ 0 mds_migrator >>>>>> 0/ 0 buffer >>>>>> 0/ 0 timer >>>>>> 0/ 0 filer >>>>>> 0/ 1 striper >>>>>> 0/ 0 objecter >>>>>> 0/ 0 rados >>>>>> 0/ 0 rbd >>>>>> 0/ 5 rbd_mirror >>>>>> 0/ 5 rbd_replay >>>>>> 0/ 0 journaler >>>>>> 0/ 0 objectcacher >>>>>> 0/ 0 client >>>>>> 0/ 0 osd >>>>>> 0/ 0 optracker >>>>>> 0/ 0 objclass >>>>>> 0/ 0 filestore >>>>>> 0/ 0 journal >>>>>> 0/ 0 ms >>>>>> 0/ 0 mon >>>>>> 0/ 0 monc >>>>>> 0/ 0 paxos >>>>>> 0/ 0 tp >>>>>> 0/ 0 auth >>>>>> 1/ 5 crypto >>>>>> 0/ 0 finisher >>>>>> 1/ 1 reserver >>>>>> 0/ 0 heartbeatmap >>>>>> 0/ 0 perfcounter >>>>>> 0/ 0 rgw >>>>>> 1/10 civetweb >>>>>> 1/ 5 javaclient >>>>>> 0/ 0 asok >>>>>> 0/ 0 throttle >>>>>> 0/ 0 refs >>>>>> 1/ 5 xio >>>>>> 1/ 5 compressor >>>>>> 1/ 5 bluestore >>>>>> 1/ 5 bluefs >>>>>> 1/ 3 bdev >>>>>> 1/ 5 kstore >>>>>> 4/ 5 rocksdb >>>>>> 4/ 5 leveldb >>>>>> 4/ 5 memdb >>>>>> 1/ 5 kinetic >>>>>> 1/ 5 fuse >>>>>> 1/ 5 mgr >>>>>> 1/ 5 mgrc >>>>>> 1/ 5 dpdk >>>>>> 1/ 5 eventtrace >>>>>> -2/-2 (syslog threshold) >>>>>> -1/-1 (stderr threshold) >>>>>> max_recent 10000 >>>>>> max_new 1000 >>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>> --- end dump of recent events --- >>>>>> >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Sage, >>>>>>>> >>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>> the following and i hope you might have an idea: >>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>> >>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>> >>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>> >>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>> luminous when all this happened. >>>>>>> >>>>>>> Did this start right after the upgrade? >>>>>>> >>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>> can provide would help! >>>>>>> >>>>>>> Thanks- >>>>>>> sage >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> >>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>> I'm guessing it is coming from >>>>>>>>> >>>>>>>>> object_snaps out; >>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>> if (r < 0) >>>>>>>>> return r; >>>>>>>>> >>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>> >>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>> can tolerate it... >>>>>>>>> >>>>>>>>> sage >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>> >>>>>>>>>> HI Sage, >>>>>>>>>> >>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>> >>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>> >>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>> >>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>> 6: >>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>> [0x55c00fd22030] >>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>> needed to interpret this. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> Hi Sage, >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>> >>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>> return r; >>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>> } else { >>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>> >>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>> can't repair. >>>>>>>>>>>> >>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>> >>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>> >>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>> >>>>>>>>>>> sage >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>> 697] >>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Greets, >>>>>>>>>>>> Stefan >>>>>>>>>>>> >>>>>>>>>>>>> sage >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html