Re: Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Mon, 12 Feb 2018 21:06:53 +0100

Hi,

see below.

Am 12.02.2018 um 20:31 schrieb Sage Weil:
> On Mon, 12 Feb 2018, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> did you had any time to look at the logs?
> 
> Sorry, I thought I replied earlier.  See below
> 
>> Am 05.02.2018 um 14:39 schrieb Stefan Priebe - Profihost AG:
>>> Hi Sage,
>>>
>>> Am 02.02.2018 um 22:05 schrieb Sage Weil:
>>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>>>> Am 02.02.2018 um 20:28 schrieb Sage Weil:
>>>>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hi Sage,
>>>>>>>
>>>>>>> some days ago i reverted to stock luminous with
>>>>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a
>>>>>>> +
>>>>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa
>>>>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were
>>>>>>> found.
>>>>>>>
>>>>>>> this worked fine until the ceph balancer runned and started remapping
>>>>>>> some pgs.
>>>>>>>
>>>>>>> This again trigged this assert:
>>>>>>>
>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)'
>>>>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579
>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>
>>>>>>> Any idea how to fix it?
>>>>>>
>>>>>> Cheap workaround is to comment out that assert.  I'm still tryign to sort 
>>>>>
>>>>> Yes i did but i feel unconforable with it as the snapmapper value is
>>>>> still wrong.
>>>>>
>>>>>> out how this happened, though.  Not sure if you shared this already, but:
>>>>>> - bluestore or filestore?
>>>>>
>>>>> As i first reported it only filestore. Now also some bluestore OSDs.
>>>>>
>>>>>> - no cache tiering, right?  nowhere on the cluster?
>>>>> Yes and Yes. Never used cache tiering at all.
>>>>>
>>>>>> It looks like the SnapMapper entry with an empty snaps vector came in via 
>>>>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() 
>>>>>> doesn't have that check).  There are a few place it could come from, 
>>>>>> including the on_local_recover() call and the update_snap_map() call.  
>>>>>> That last one is interesting because it pulls the snaps for a new clone 
>>>>>> out of the pg log entry.. maybe those aren't getting populated properly.
>>>>>
>>>>> Is it also possible that they got corrupted in the past and now ceph is
>>>>> unable to detect it and repair it with a scrub or deep scrub?
>>>>>
>>>>>> Can you confirm that all of the OSDs on the cluster are running the
>>>>>> latest?  (ceph versions).
>>>>>
>>>>> Yes they do - i always check this after each update and i rechecked it
>>>>> again.
>>>>
>>>> Ok thanks!
>>>>
>>>> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that 
>>>> has those commits above plus another one that adds debug messages in the 
>>>> various places that the empty snap vectors might be coming from.
>>>>
>>>> This won't help it get past the rebalance issue you hit above, but 
>>>> hopefully it will trigger when you create a new clone so we can see how 
>>>> this is happening.
>>>>
>>>> If we need to, I think we can ignore the assert (by commenting it out), 
>>>> but one downside there is that we'll see my warnings pop up when 
>>>> recovering those objects so we won't be able to tell where they really 
>>>> came from. Since you were seeing this on new snaps/clones, though, I'm 
>>>> hoping we won't end up in that situation!
>>>
>>> the biggest part of the logs is now this stuff:
>>> 2018-02-05 07:56:32.712366 7fb6893ff700  0 log_channel(cluster) log
>>> [DBG] : 3.740 scrub ok
>>> 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848]
>>> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
>>> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
>>> luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no
>>> head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77
>>> (have MIN)
>>> 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191
>>> pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901]
>>> local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f
>>> 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620
>>> luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no
>>> head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f
>>> (have MIN)
>>> 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546]
>>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
>>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
>>> luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no
>>> head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55
>>> (have MIN)
>>> 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
>>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
>>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
>>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
>>> head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77
>>> (have MIN)
>>> 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
>>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
>>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
>>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
>>> head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5
>>> (have MIN)
>>> 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549]
>>> local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f
>>> 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694
>>> luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no
>>> head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e
>>> (have MIN)
>>> 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900]
>>> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
>>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
>>> luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps
>>> no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77
>>> (have MIN)
>>> 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191
>>> pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970]
>>> local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f
>>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
>>> luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps
>>> no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3
>>> (have MIN)
>>> 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191
>>> pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043]
>>> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
>>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
>>> luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps
>>> no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42
>>> (have MIN)
>>> 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191
>>> pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243]
>>> local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f
>>> 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522
>>> luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps
>>> no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7
>>> (have MIN)
>>>
>>> Found nothing else in the looks until now. Should i search for more OSDs
>>> or is this already something we should consider?
> 
> Yeah, those messages aren't super informative.  What would be useful is to 
> check for any of those (or, more importantly the other new messages that 
> branch adds) on *new* object clones for snapshots that have just been 
> created.  (Earlier, you mentioned that the errors you were seeing referred 
> to snapshots that had just been created, right?  The key question is how 
> those new clones are getting registered in snapmapper with an empty list 
> of snaps.)

Yes i checked this twice - but i cannot find anything in the logs now.

I checked the logs with:
grep 'snap' /var/log/ceph/ceph-osd.[0-9]*.log | grep -v ' _scan_snaps no
head for '

and the output is just:
/var/log/ceph/ceph-osd.2.log:2018-02-12 07:44:29.169600 7f9f6abff700 -1
osd.2 pg_epoch: 949525 pg[3.7ae( v 949525'24972392
(949524'24970891,949525'24972392] local-lis/les=946216/946218 n=1608
ec=15/15 lis/c 946216/946216 les/c/f 946218/946223/937357
946216/946216/945695) [2,26,43] r=0 lpr=946216 luod=949524'24972391
lua=949524'24972391 crt=949525'24972392 lcod 949524'24972390 mlcod
949524'24972390 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]]
removing snap head

/var/log/ceph/ceph-osd.39.log:2018-02-12 07:44:28.829474 7fbaa5bff700 -1
osd.39 pg_epoch: 949525 pg[3.864( v 949525'60058835
(949524'60057284,949525'60058835] local-lis/les=944678/944679 n=1621
ec=15/15 lis/c 944678/944678 les/c/f 944679/944684/937357
944678/944678/944522) [39,80,31] r=0 lpr=944678 crt=949525'60058835 lcod
949524'60058834 mlcod 949524'60058834 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

/var/log/ceph/ceph-osd.19.log:2018-02-12 07:44:28.661329 7f5f9bfec700 -1
osd.19 pg_epoch: 949525 pg[3.d78( v 949525'32475960
(949495'32474451,949525'32475960] local-lis/les=944690/944691 n=1497
ec=15/15 lis/c 944690/944690 les/c/f 944691/944697/937357
944690/944690/944542) [19,75,78] r=0 lpr=944690 luod=949525'32475958
lua=949525'32475958 crt=949525'32475960 lcod 949524'32475957 mlcod
949524'32475957 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]]
removing snap head

/var/log/ceph/ceph-osd.33.log:2018-02-12 07:44:28.899062 7f0282fff700 -1
osd.33 pg_epoch: 949525 pg[3.298( v 949525'82343947
(949524'82342349,949525'82343947] local-lis/les=944603/944604 n=1654
ec=15/15 lis/c 944603/944603 les/c/f 944604/944612/937357
944603/944603/944603) [33,13,18] r=0 lpr=944603 crt=949525'82343947 lcod
949524'82343946 mlcod 949524'82343946 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head
/var/log/ceph/ceph-osd.33.log:2018-02-12 07:44:29.313090 7f0281fff700 -1
osd.33 pg_epoch: 949525 pg[3.2c1( v 949525'58513621
(949524'58512073,949525'58513621] local-lis/les=944678/944679 n=1660
ec=15/15 lis/c 944678/944678 les/c/f 944679/944687/937357
944678/944678/944603) [33,80,56] r=0 lpr=944678 luod=949525'58513619
lua=949525'58513619 crt=949525'58513621 lcod 949525'58513618 mlcod
949525'58513618 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]]
removing snap head
/var/log/ceph/ceph-osd.45.log:2018-02-12 07:44:28.916617 7fea14ffd700 -1
osd.45 pg_epoch: 949525 pg[3.bf9( v 949525'71145615
(949524'71144094,949525'71145615] local-lis/les=944620/944621 n=1639
ec=15/15 lis/c 944620/944620 les/c/f 944621/944624/937357
944620/944620/944608) [45,28,58] r=0 lpr=944620 crt=949525'71145615 lcod
949525'71145614 mlcod 949525'71145614 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

/var/log/ceph/ceph-osd.62.log:2018-02-12 07:44:28.696833 7f13073fe700 -1
osd.62 pg_epoch: 949525 pg[3.3e8( v 949525'30577716
(949524'30576140,949525'30577716] local-lis/les=946273/946274 n=1609
ec=15/15 lis/c 946273/946273 les/c/f 946274/946277/937357
946273/946273/944634) [62,80,37] r=0 lpr=946273 luod=949524'30577712
lua=949524'30577712 crt=949525'30577716 lcod 949524'30577711 mlcod
949524'30577711 active+clean+snaptrim snaptrimq=[b3f1a~1,b4081~1]]
removing snap head
/var/log/ceph/ceph-osd.62.log:2018-02-12 07:44:28.801300 7f13073fe700 -1
osd.62 pg_epoch: 949525 pg[3.5d0( v 949525'90315972
(949524'90314417,949525'90315972] local-lis/les=944677/944678 n=1631
ec=15/15 lis/c 944677/944677 les/c/f 944678/944684/937357
944677/944677/944634) [62,58,88] r=0 lpr=944677 crt=949525'90315972 lcod
949524'90315971 mlcod 949524'90315971 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

/var/log/ceph/ceph-osd.81.log:2018-02-12 07:44:29.465068 7f9f9c7ff700 -1
osd.81 pg_epoch: 949525 pg[3.3cc( v 949525'46127568
(949524'46125993,949525'46127568] local-lis/les=944676/944677 n=1547
ec=15/15 lis/c 944676/944676 les/c/f 944677/944684/937357
944676/944676/944676) [81,59,66] r=0 lpr=944676 crt=949525'46127568 lcod
949525'46127567 mlcod 949525'46127567 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

/var/log/ceph/ceph-osd.91.log:2018-02-12 07:44:29.258414 7f6eebffd700 -1
osd.91 pg_epoch: 949525 pg[3.efd( v 949525'24155822
(949524'24154262,949525'24155822] local-lis/les=945695/945696 n=1645
ec=15/15 lis/c 945695/945695 les/c/f 945696/945697/937357
945695/945695/944694) [91,2,41] r=0 lpr=945695 crt=949525'24155822 lcod
949525'24155821 mlcod 949525'24155821 active+clean+snaptrim
snaptrimq=[b3f1a~1,b4081~1]] removing snap head

Thanks!

Stefan

> 
> Thanks!
> sage
> 
> 
> 
>>> Stefan
>>>
>>>> Thanks!
>>>> sage
>>>>
>>>>
>>>>
>>>>  > 
>>>>> Stefan
>>>>>
>>>>>
>>>>>>>
>>>>>>> Greets,
>>>>>>> Stefan
>>>>>>>
>>>>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>> Hi Sage,
>>>>>>>>>
>>>>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>>>>>>> telling me:
>>>>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>>>>>>> on pg 3.33e oid
>>>>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>>>>>>> mapper: , oi: b0d7c...repaired
>>>>>>>>>
>>>>>>>>> Currently it seems it does an endless repair and never ends.
>>>>>>>>
>>>>>>>> On many objects or the same object?
>>>>>>>>
>>>>>>>> sage
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Greets,
>>>>>>>>> Stefan
>>>>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>
>>>>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>>>>>>> it's OK to skip the assert.
>>>>>>>>>>>
>>>>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>>>>>>> needs to get killed to proceed.
>>>>>>>>>>
>>>>>>>>>> I believe this will fix it:
>>>>>>>>>>
>>>>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>>>>>>
>>>>>>>>>> Cherry-pick that and let me know?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> sage
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The log output is always:
>>>>>>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>>>>>>> (Aborted) **
>>>>>>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>>>>>>
>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>> [0x561d87138bc7]
>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>> [0x561d876f42bd]
>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>
>>>>>>>>>>> --- logging levels ---
>>>>>>>>>>>    0/ 5 none
>>>>>>>>>>>    0/ 0 lockdep
>>>>>>>>>>>    0/ 0 context
>>>>>>>>>>>    0/ 0 crush
>>>>>>>>>>>    0/ 0 mds
>>>>>>>>>>>    0/ 0 mds_balancer
>>>>>>>>>>>    0/ 0 mds_locker
>>>>>>>>>>>    0/ 0 mds_log
>>>>>>>>>>>    0/ 0 mds_log_expire
>>>>>>>>>>>    0/ 0 mds_migrator
>>>>>>>>>>>    0/ 0 buffer
>>>>>>>>>>>    0/ 0 timer
>>>>>>>>>>>    0/ 0 filer
>>>>>>>>>>>    0/ 1 striper
>>>>>>>>>>>    0/ 0 objecter
>>>>>>>>>>>    0/ 0 rados
>>>>>>>>>>>    0/ 0 rbd
>>>>>>>>>>>    0/ 5 rbd_mirror
>>>>>>>>>>>    0/ 5 rbd_replay
>>>>>>>>>>>    0/ 0 journaler
>>>>>>>>>>>    0/ 0 objectcacher
>>>>>>>>>>>    0/ 0 client
>>>>>>>>>>>    0/ 0 osd
>>>>>>>>>>>    0/ 0 optracker
>>>>>>>>>>>    0/ 0 objclass
>>>>>>>>>>>    0/ 0 filestore
>>>>>>>>>>>    0/ 0 journal
>>>>>>>>>>>    0/ 0 ms
>>>>>>>>>>>    0/ 0 mon
>>>>>>>>>>>    0/ 0 monc
>>>>>>>>>>>    0/ 0 paxos
>>>>>>>>>>>    0/ 0 tp
>>>>>>>>>>>    0/ 0 auth
>>>>>>>>>>>    1/ 5 crypto
>>>>>>>>>>>    0/ 0 finisher
>>>>>>>>>>>    1/ 1 reserver
>>>>>>>>>>>    0/ 0 heartbeatmap
>>>>>>>>>>>    0/ 0 perfcounter
>>>>>>>>>>>    0/ 0 rgw
>>>>>>>>>>>    1/10 civetweb
>>>>>>>>>>>    1/ 5 javaclient
>>>>>>>>>>>    0/ 0 asok
>>>>>>>>>>>    0/ 0 throttle
>>>>>>>>>>>    0/ 0 refs
>>>>>>>>>>>    1/ 5 xio
>>>>>>>>>>>    1/ 5 compressor
>>>>>>>>>>>    1/ 5 bluestore
>>>>>>>>>>>    1/ 5 bluefs
>>>>>>>>>>>    1/ 3 bdev
>>>>>>>>>>>    1/ 5 kstore
>>>>>>>>>>>    4/ 5 rocksdb
>>>>>>>>>>>    4/ 5 leveldb
>>>>>>>>>>>    4/ 5 memdb
>>>>>>>>>>>    1/ 5 kinetic
>>>>>>>>>>>    1/ 5 fuse
>>>>>>>>>>>    1/ 5 mgr
>>>>>>>>>>>    1/ 5 mgrc
>>>>>>>>>>>    1/ 5 dpdk
>>>>>>>>>>>    1/ 5 eventtrace
>>>>>>>>>>>   -2/-2 (syslog threshold)
>>>>>>>>>>>   -1/-1 (stderr threshold)
>>>>>>>>>>>   max_recent     10000
>>>>>>>>>>>   max_new         1000
>>>>>>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>>>>>>> --- end dump of recent events ---
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Greets,
>>>>>>>>>>> Stefan
>>>>>>>>>>>
>>>>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>>>>>>> the following and i hope you might have an idea:
>>>>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>>>>>>
>>>>>>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>>>>>>
>>>>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>>>>>>> luminous when all this happened.
>>>>>>>>>>>>
>>>>>>>>>>>> Did this start right after the upgrade?
>>>>>>>>>>>>
>>>>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>>>>>>> can provide would help!
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks-
>>>>>>>>>>>> sage
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   object_snaps out;
>>>>>>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>>>>>>> can tolerate it...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> HI Sage,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>>>>>>  6:
>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>>>>>>> 697]
>>>>>>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html