Hi, see below. Am 12.02.2018 um 20:31 schrieb Sage Weil: > On Mon, 12 Feb 2018, Stefan Priebe - Profihost AG wrote: >> Hi Sage, >> >> did you had any time to look at the logs? > > Sorry, I thought I replied earlier. See below > >> Am 05.02.2018 um 14:39 schrieb Stefan Priebe - Profihost AG: >>> Hi Sage, >>> >>> Am 02.02.2018 um 22:05 schrieb Sage Weil: >>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>>>> Am 02.02.2018 um 20:28 schrieb Sage Weil: >>>>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: >>>>>>> Hi Sage, >>>>>>> >>>>>>> some days ago i reverted to stock luminous with >>>>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a >>>>>>> + >>>>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa >>>>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were >>>>>>> found. >>>>>>> >>>>>>> this worked fine until the ceph balancer runned and started remapping >>>>>>> some pgs. >>>>>>> >>>>>>> This again trigged this assert: >>>>>>> >>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' >>>>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 >>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>> >>>>>>> Any idea how to fix it? >>>>>> >>>>>> Cheap workaround is to comment out that assert. I'm still tryign to sort >>>>> >>>>> Yes i did but i feel unconforable with it as the snapmapper value is >>>>> still wrong. >>>>> >>>>>> out how this happened, though. Not sure if you shared this already, but: >>>>>> - bluestore or filestore? >>>>> >>>>> As i first reported it only filestore. Now also some bluestore OSDs. >>>>> >>>>>> - no cache tiering, right? nowhere on the cluster? >>>>> Yes and Yes. Never used cache tiering at all. >>>>> >>>>>> It looks like the SnapMapper entry with an empty snaps vector came in via >>>>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() >>>>>> doesn't have that check). There are a few place it could come from, >>>>>> including the on_local_recover() call and the update_snap_map() call. >>>>>> That last one is interesting because it pulls the snaps for a new clone >>>>>> out of the pg log entry.. maybe those aren't getting populated properly. >>>>> >>>>> Is it also possible that they got corrupted in the past and now ceph is >>>>> unable to detect it and repair it with a scrub or deep scrub? >>>>> >>>>>> Can you confirm that all of the OSDs on the cluster are running the >>>>>> latest? (ceph versions). >>>>> >>>>> Yes they do - i always check this after each update and i rechecked it >>>>> again. >>>> >>>> Ok thanks! >>>> >>>> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that >>>> has those commits above plus another one that adds debug messages in the >>>> various places that the empty snap vectors might be coming from. >>>> >>>> This won't help it get past the rebalance issue you hit above, but >>>> hopefully it will trigger when you create a new clone so we can see how >>>> this is happening. >>>> >>>> If we need to, I think we can ignore the assert (by commenting it out), >>>> but one downside there is that we'll see my warnings pop up when >>>> recovering those objects so we won't be able to tell where they really >>>> came from. Since you were seeing this on new snaps/clones, though, I'm >>>> hoping we won't end up in that situation! >>> >>> the biggest part of the logs is now this stuff: >>> 2018-02-05 07:56:32.712366 7fb6893ff700 0 log_channel(cluster) log >>> [DBG] : 3.740 scrub ok >>> 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848] >>> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f >>> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 >>> luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no >>> head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77 >>> (have MIN) >>> 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191 >>> pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901] >>> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f >>> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 >>> luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no >>> head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f >>> (have MIN) >>> 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546] >>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f >>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 >>> luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no >>> head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55 >>> (have MIN) >>> 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] >>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f >>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 >>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no >>> head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77 >>> (have MIN) >>> 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] >>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f >>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 >>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no >>> head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5 >>> (have MIN) >>> 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] >>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f >>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 >>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no >>> head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e >>> (have MIN) >>> 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900] >>> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f >>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 >>> luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps >>> no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77 >>> (have MIN) >>> 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191 >>> pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970] >>> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f >>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 >>> luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps >>> no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3 >>> (have MIN) >>> 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191 >>> pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043] >>> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f >>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 >>> luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps >>> no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42 >>> (have MIN) >>> 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191 >>> pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243] >>> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f >>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 >>> luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps >>> no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7 >>> (have MIN) >>> >>> Found nothing else in the looks until now. Should i search for more OSDs >>> or is this already something we should consider? > > Yeah, those messages aren't super informative. What would be useful is to > check for any of those (or, more importantly the other new messages that > branch adds) on *new* object clones for snapshots that have just been > created. (Earlier, you mentioned that the errors you were seeing referred > to snapshots that had just been created, right? The key question is how > those new clones are getting registered in snapmapper with an empty list > of snaps.) Yes i checked this twice - but i cannot find anything in the logs now. I checked the logs with: grep 'snap' /var/log/ceph/ceph-osd.[0-9]*.log | grep -v ' _scan_snaps no head for ' and the output is just: /var/log/ceph/ceph-osd.2.log:2018-02-12 07:44:29.169600 7f9f6abff700 -1 osd.2 pg_epoch: 949525 pg[3.7ae( v 949525'24972392 (949524'24970891,949525'24972392] local-lis/les=946216/946218 n=1608 ec=15/15 lis/c 946216/946216 les/c/f 946218/946223/937357 946216/946216/945695) [2,26,43] r=0 lpr=946216 luod=949524'24972391 lua=949524'24972391 crt=949525'24972392 lcod 949524'24972390 mlcod 949524'24972390 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.39.log:2018-02-12 07:44:28.829474 7fbaa5bff700 -1 osd.39 pg_epoch: 949525 pg[3.864( v 949525'60058835 (949524'60057284,949525'60058835] local-lis/les=944678/944679 n=1621 ec=15/15 lis/c 944678/944678 les/c/f 944679/944684/937357 944678/944678/944522) [39,80,31] r=0 lpr=944678 crt=949525'60058835 lcod 949524'60058834 mlcod 949524'60058834 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.19.log:2018-02-12 07:44:28.661329 7f5f9bfec700 -1 osd.19 pg_epoch: 949525 pg[3.d78( v 949525'32475960 (949495'32474451,949525'32475960] local-lis/les=944690/944691 n=1497 ec=15/15 lis/c 944690/944690 les/c/f 944691/944697/937357 944690/944690/944542) [19,75,78] r=0 lpr=944690 luod=949525'32475958 lua=949525'32475958 crt=949525'32475960 lcod 949524'32475957 mlcod 949524'32475957 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.33.log:2018-02-12 07:44:28.899062 7f0282fff700 -1 osd.33 pg_epoch: 949525 pg[3.298( v 949525'82343947 (949524'82342349,949525'82343947] local-lis/les=944603/944604 n=1654 ec=15/15 lis/c 944603/944603 les/c/f 944604/944612/937357 944603/944603/944603) [33,13,18] r=0 lpr=944603 crt=949525'82343947 lcod 949524'82343946 mlcod 949524'82343946 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.33.log:2018-02-12 07:44:29.313090 7f0281fff700 -1 osd.33 pg_epoch: 949525 pg[3.2c1( v 949525'58513621 (949524'58512073,949525'58513621] local-lis/les=944678/944679 n=1660 ec=15/15 lis/c 944678/944678 les/c/f 944679/944687/937357 944678/944678/944603) [33,80,56] r=0 lpr=944678 luod=949525'58513619 lua=949525'58513619 crt=949525'58513621 lcod 949525'58513618 mlcod 949525'58513618 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.45.log:2018-02-12 07:44:28.916617 7fea14ffd700 -1 osd.45 pg_epoch: 949525 pg[3.bf9( v 949525'71145615 (949524'71144094,949525'71145615] local-lis/les=944620/944621 n=1639 ec=15/15 lis/c 944620/944620 les/c/f 944621/944624/937357 944620/944620/944608) [45,28,58] r=0 lpr=944620 crt=949525'71145615 lcod 949525'71145614 mlcod 949525'71145614 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.62.log:2018-02-12 07:44:28.696833 7f13073fe700 -1 osd.62 pg_epoch: 949525 pg[3.3e8( v 949525'30577716 (949524'30576140,949525'30577716] local-lis/les=946273/946274 n=1609 ec=15/15 lis/c 946273/946273 les/c/f 946274/946277/937357 946273/946273/944634) [62,80,37] r=0 lpr=946273 luod=949524'30577712 lua=949524'30577712 crt=949525'30577716 lcod 949524'30577711 mlcod 949524'30577711 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.62.log:2018-02-12 07:44:28.801300 7f13073fe700 -1 osd.62 pg_epoch: 949525 pg[3.5d0( v 949525'90315972 (949524'90314417,949525'90315972] local-lis/les=944677/944678 n=1631 ec=15/15 lis/c 944677/944677 les/c/f 944678/944684/937357 944677/944677/944634) [62,58,88] r=0 lpr=944677 crt=949525'90315972 lcod 949524'90315971 mlcod 949524'90315971 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.81.log:2018-02-12 07:44:29.465068 7f9f9c7ff700 -1 osd.81 pg_epoch: 949525 pg[3.3cc( v 949525'46127568 (949524'46125993,949525'46127568] local-lis/les=944676/944677 n=1547 ec=15/15 lis/c 944676/944676 les/c/f 944677/944684/937357 944676/944676/944676) [81,59,66] r=0 lpr=944676 crt=949525'46127568 lcod 949525'46127567 mlcod 949525'46127567 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head /var/log/ceph/ceph-osd.91.log:2018-02-12 07:44:29.258414 7f6eebffd700 -1 osd.91 pg_epoch: 949525 pg[3.efd( v 949525'24155822 (949524'24154262,949525'24155822] local-lis/les=945695/945696 n=1645 ec=15/15 lis/c 945695/945695 les/c/f 945696/945697/937357 945695/945695/944694) [91,2,41] r=0 lpr=945695 crt=949525'24155822 lcod 949525'24155821 mlcod 949525'24155821 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]] removing snap head Thanks! Stefan > > Thanks! > sage > > > >>> Stefan >>> >>>> Thanks! >>>> sage >>>> >>>> >>>> >>>> > >>>>> Stefan >>>>> >>>>> >>>>>>> >>>>>>> Greets, >>>>>>> Stefan >>>>>>> >>>>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: >>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>> Hi Sage, >>>>>>>>> >>>>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops >>>>>>>>> telling me: >>>>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error >>>>>>>>> on pg 3.33e oid >>>>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in >>>>>>>>> mapper: , oi: b0d7c...repaired >>>>>>>>> >>>>>>>>> Currently it seems it does an endless repair and never ends. >>>>>>>> >>>>>>>> On many objects or the same object? >>>>>>>> >>>>>>>> sage >>>>>>>> >>>>>>>>> >>>>>>>>> Greets, >>>>>>>>> Stefan >>>>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: >>>>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> Hi Sage, >>>>>>>>>>> >>>>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So >>>>>>>>>>> it's OK to skip the assert. >>>>>>>>>>> >>>>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to >>>>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and >>>>>>>>>>> needs to get killed to proceed. >>>>>>>>>> >>>>>>>>>> I believe this will fix it: >>>>>>>>>> >>>>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper >>>>>>>>>> >>>>>>>>>> Cherry-pick that and let me know? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> sage >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The log output is always: >>>>>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) >>>>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at >>>>>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 >>>>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 >>>>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent >>>>>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) >>>>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at >>>>>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 >>>>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 >>>>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent >>>>>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) >>>>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at >>>>>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 >>>>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 >>>>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent >>>>>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal >>>>>>>>>>> (Aborted) ** >>>>>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp >>>>>>>>>>> >>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] >>>>>>>>>>> 2: (()+0xf890) [0x7f100b974890] >>>>>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] >>>>>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, >>>>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran >>>>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] >>>>>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, >>>>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] >>>>>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] >>>>>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] >>>>>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] >>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] >>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>> [0x561d87138bc7] >>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>> const&)+0x57) [0x561d873b0db7] >>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] >>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>> [0x561d876f42bd] >>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] >>>>>>>>>>> 15: (()+0x8064) [0x7f100b96d064] >>>>>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] >>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>> needed to interpret this. >>>>>>>>>>> >>>>>>>>>>> --- logging levels --- >>>>>>>>>>> 0/ 5 none >>>>>>>>>>> 0/ 0 lockdep >>>>>>>>>>> 0/ 0 context >>>>>>>>>>> 0/ 0 crush >>>>>>>>>>> 0/ 0 mds >>>>>>>>>>> 0/ 0 mds_balancer >>>>>>>>>>> 0/ 0 mds_locker >>>>>>>>>>> 0/ 0 mds_log >>>>>>>>>>> 0/ 0 mds_log_expire >>>>>>>>>>> 0/ 0 mds_migrator >>>>>>>>>>> 0/ 0 buffer >>>>>>>>>>> 0/ 0 timer >>>>>>>>>>> 0/ 0 filer >>>>>>>>>>> 0/ 1 striper >>>>>>>>>>> 0/ 0 objecter >>>>>>>>>>> 0/ 0 rados >>>>>>>>>>> 0/ 0 rbd >>>>>>>>>>> 0/ 5 rbd_mirror >>>>>>>>>>> 0/ 5 rbd_replay >>>>>>>>>>> 0/ 0 journaler >>>>>>>>>>> 0/ 0 objectcacher >>>>>>>>>>> 0/ 0 client >>>>>>>>>>> 0/ 0 osd >>>>>>>>>>> 0/ 0 optracker >>>>>>>>>>> 0/ 0 objclass >>>>>>>>>>> 0/ 0 filestore >>>>>>>>>>> 0/ 0 journal >>>>>>>>>>> 0/ 0 ms >>>>>>>>>>> 0/ 0 mon >>>>>>>>>>> 0/ 0 monc >>>>>>>>>>> 0/ 0 paxos >>>>>>>>>>> 0/ 0 tp >>>>>>>>>>> 0/ 0 auth >>>>>>>>>>> 1/ 5 crypto >>>>>>>>>>> 0/ 0 finisher >>>>>>>>>>> 1/ 1 reserver >>>>>>>>>>> 0/ 0 heartbeatmap >>>>>>>>>>> 0/ 0 perfcounter >>>>>>>>>>> 0/ 0 rgw >>>>>>>>>>> 1/10 civetweb >>>>>>>>>>> 1/ 5 javaclient >>>>>>>>>>> 0/ 0 asok >>>>>>>>>>> 0/ 0 throttle >>>>>>>>>>> 0/ 0 refs >>>>>>>>>>> 1/ 5 xio >>>>>>>>>>> 1/ 5 compressor >>>>>>>>>>> 1/ 5 bluestore >>>>>>>>>>> 1/ 5 bluefs >>>>>>>>>>> 1/ 3 bdev >>>>>>>>>>> 1/ 5 kstore >>>>>>>>>>> 4/ 5 rocksdb >>>>>>>>>>> 4/ 5 leveldb >>>>>>>>>>> 4/ 5 memdb >>>>>>>>>>> 1/ 5 kinetic >>>>>>>>>>> 1/ 5 fuse >>>>>>>>>>> 1/ 5 mgr >>>>>>>>>>> 1/ 5 mgrc >>>>>>>>>>> 1/ 5 dpdk >>>>>>>>>>> 1/ 5 eventtrace >>>>>>>>>>> -2/-2 (syslog threshold) >>>>>>>>>>> -1/-1 (stderr threshold) >>>>>>>>>>> max_recent 10000 >>>>>>>>>>> max_new 1000 >>>>>>>>>>> log_file /var/log/ceph/ceph-osd.59.log >>>>>>>>>>> --- end dump of recent events --- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Greets, >>>>>>>>>>> Stefan >>>>>>>>>>> >>>>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: >>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is >>>>>>>>>>>>> the following and i hope you might have an idea: >>>>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today >>>>>>>>>>>>> running luminous? How can they be missing in snapmapper again? >>>>>>>>>>>> >>>>>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so >>>>>>>>>>>> that when you hit the crash we have some context as to what is going on. >>>>>>>>>>>> >>>>>>>>>>>> Did you track down which part of get_snaps() is returning the error? >>>>>>>>>>>> >>>>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to >>>>>>>>>>>>> luminous when all this happened. >>>>>>>>>>>> >>>>>>>>>>>> Did this start right after the upgrade? >>>>>>>>>>>> >>>>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up >>>>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), >>>>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you >>>>>>>>>>>> can provide would help! >>>>>>>>>>>> >>>>>>>>>>>> Thanks- >>>>>>>>>>>> sage >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Greets, >>>>>>>>>>>>> Stefan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: >>>>>>>>>>>>>> I'm guessing it is coming from >>>>>>>>>>>>>> >>>>>>>>>>>>>> object_snaps out; >>>>>>>>>>>>>> int r = get_snaps(oid, &out); >>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>> return r; >>>>>>>>>>>>>> >>>>>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to >>>>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return >>>>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's >>>>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we >>>>>>>>>>>>>> can tolerate it... >>>>>>>>>>>>>> >>>>>>>>>>>>>> sage >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> HI Sage, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: >>>>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] >>>>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f >>>>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 >>>>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for >>>>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 >>>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const >>>>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread >>>>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 >>>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] >>>>>>>>>>>>>>> 2: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] >>>>>>>>>>>>>>> 3: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] >>>>>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] >>>>>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>> [0x55c00fdf6642] >>>>>>>>>>>>>>> 6: >>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>> [0x55c00fdfc684] >>>>>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>> [0x55c00fd22030] >>>>>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] >>>>>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>> [0x55c00fb1abc7] >>>>>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] >>>>>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] >>>>>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>> [0x55c0100d5ffd] >>>>>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] >>>>>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] >>>>>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] >>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: >>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: >>>>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> it also crashes in (marked with HERE): >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> int SnapMapper::get_snaps( >>>>>>>>>>>>>>>>>>> const hobject_t &oid, >>>>>>>>>>>>>>>>>>> object_snaps *out) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> assert(check(oid)); >>>>>>>>>>>>>>>>>>> set<string> keys; >>>>>>>>>>>>>>>>>>> map<string, bufferlist> got; >>>>>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); >>>>>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); >>>>>>>>>>>>>>>>>>> if (r < 0) >>>>>>>>>>>>>>>>>>> return r; >>>>>>>>>>>>>>>>>>> if (got.empty()) >>>>>>>>>>>>>>>>>>> return -ENOENT; >>>>>>>>>>>>>>>>>>> if (out) { >>>>>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); >>>>>>>>>>>>>>>>>>> ::decode(*out, bp); >>>>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; >>>>>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### >>>>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> is it save to comment that assert? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper >>>>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. >>>>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a >>>>>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time >>>>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is >>>>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or >>>>>>>>>>>>>>>>>> can't repair. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But >>>>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: >>>>>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 >>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int >>>>>>>>>>>>>>>>> SnapMapper::get_snaps(const h >>>>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time >>>>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 >>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] >>>>>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, >>>>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] >>>>>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7 >>>>>>>>>>>>>>>>> 697] >>>>>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] >>>>>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >>>>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] >>>>>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] >>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] >>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>> [0x5561cdcd7bc7] >>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] >>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] >>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>> [0x5561ce292e7d] >>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] >>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] >>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] >>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi Sage, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 >>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void >>>>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l >>>>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, >>>>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti >>>>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 >>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != >>>>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: >>>>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 >>>>>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait >>>>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] >>>>>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 >>>>>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work >>>>>>>>>>>>>>>>>>>> around the current assert: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 >>>>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc >>>>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( >>>>>>>>>>>>>>>>>>>> set<snapid_t> snaps; >>>>>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; >>>>>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); >>>>>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? >>>>>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { >>>>>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; >>>>>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; >>>>>>>>>>>>>>>>>>>> snap_mapper.add_oid( >>>>>>>>>>>>>>>>>>>> recovery_info.soid, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> s >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef >>>>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] >>>>>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo >>>>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool, >>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] >>>>>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, >>>>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] >>>>>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb30748f] >>>>>>>>>>>>>>>>>>>>>>> 5: >>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb317531] >>>>>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] >>>>>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] >>>>>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] >>>>>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] >>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] >>>>>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] >>>>>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] >>>>>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] >>>>>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] >>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: >>>>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by >>>>>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap >>>>>>>>>>>>>>>>>>>>>>>> list mapping. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so >>>>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error >>>>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't >>>>>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it >>>>>>>>>>>>>>>>>>>>>>>> happens. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them >>>>>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> sage >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: >>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG >>>>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds >>>>>>>>>>>>>>>>>>>>>>>>>>> being down. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: >>>>>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 >>>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void >>>>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, >>>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' >>>>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 >>>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 >>>>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) >>>>>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] >>>>>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] >>>>>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] >>>>>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] >>>>>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, >>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, >>>>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&, >>>>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] >>>>>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] >>>>>>>>>>>>>>>>>>>>>>>>>>> 7: >>>>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] >>>>>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] >>>>>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >>>>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] >>>>>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] >>>>>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >>>>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] >>>>>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] >>>>>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) >>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] >>>>>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] >>>>>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] >>>>>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] >>>>>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is >>>>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that >>>>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's >>>>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is >>>>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have >>>>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG >>>>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object >>>>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and >>>>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of >>>>>>>>>>>>>>>>>>>>>>>>>> them myself. >>>>>>>>>>>>>>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html