Re: Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Wed, 24 Jan 2018 08:17:16 +0100

Hi Sage,

Am 24.01.2018 um 01:07 schrieb Sage Weil:
> On Tue, 23 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>
>> Am 22.01.2018 um 20:01 schrieb Sage Weil:
>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops
>>>> telling me:
>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error
>>>> on pg 3.33e oid
>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in
>>>> mapper: , oi: b0d7c...repaired
>>>>
>>>> Currently it seems it does an endless repair and never ends.
>>>
>>> On many objects or the same object?
>>
>> OK no it still does endless repair but the reason is it also repairs
>> "newly" created snapshots.
>>
>> Example:
>> 2018-01-23 21:46:47.609712 osd.32 [ERR] osd.32 found snap mapper error
>> on pg 3.c70 oid
>> 3:0e3ff478:::rbd_data.52ab946b8b4567.00000000000025b2:b0d7c snaps
>> missing in mapper, should be: b0d7c...repaired
>>
>> rbd image 'vm-251-disk-1':
>>         size 170 GB in 43520 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.52ab946b8b4567
>>         format: 2
>>         features: layering
>>         flags:
>> SNAPID NAME                                   SIZE TIMESTAMP
>> 726900 PVESNAP_cloud2-1394_vm_diff_20180123 170 GB Tue Jan 23 02:14:35 2018
>>
>> So this is a fresh snapshot which needs a repair?
> 
> Aha!  That's a helpful clue.  Is there any chance you can crank up teh 
> osd debug log on one OSD can catch one of these cases?  (We want to 
> capture the clone being created, and also the scrub on the pg that follows 
> and sees the mismatch).

You mean debug_osd? How high must it be? And do we really need all log
messages or might it be enough to change some of them to 0 in the source
code - so they always log?

Greets,
Stefan

> Thanks!
> sage
> 
> 
>> Greets,
>> Stefan
>>
>>>
>>> sage
>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil:
>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Sage,
>>>>>>
>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So
>>>>>> it's OK to skip the assert.
>>>>>>
>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to
>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and
>>>>>> needs to get killed to proceed.
>>>>>
>>>>> I believe this will fix it:
>>>>>
>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper
>>>>>
>>>>> Cherry-pick that and let me know?
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>>
>>>>>
>>>>>>
>>>>>> The log output is always:
>>>>>>     -3> 2018-01-22 14:02:41.725678 7f10025fe700  0 log_channel(cluster)
>>>>>> log [WRN] : slow request 141.951820 seconds old, received at
>>>>>>  2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65
>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000
>>>>>> 00000157:head v 927921'526559757) currently commit_sent
>>>>>>     -2> 2018-01-22 14:02:41.725679 7f10025fe700  0 log_channel(cluster)
>>>>>> log [WRN] : slow request 141.643853 seconds old, received at
>>>>>>  2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7
>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000
>>>>>> 00000147d:head v 927921'29105383) currently commit_sent
>>>>>>     -1> 2018-01-22 14:02:41.725684 7f10025fe700  0 log_channel(cluster)
>>>>>> log [WRN] : slow request 141.313220 seconds old, received at
>>>>>>  2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6
>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000
>>>>>> 000000aff:head v 927921'164552063) currently commit_sent
>>>>>>      0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal
>>>>>> (Aborted) **
>>>>>>  in thread 7f0fc23fe700 thread_name:tp_osd_tp
>>>>>>
>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>  1: (()+0xa4423c) [0x561d876ab23c]
>>>>>>  2: (()+0xf890) [0x7f100b974890]
>>>>>>  3: (pthread_cond_wait()+0xbf) [0x7f100b97104f]
>>>>>>  4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*,
>>>>>> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Tran
>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54]
>>>>>>  5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*,
>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c]
>>>>>>  6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77]
>>>>>>  7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f]
>>>>>>  8: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574]
>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756]
>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>> [0x561d87138bc7]
>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>> const&)+0x57) [0x561d873b0db7]
>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c]
>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>> [0x561d876f42bd]
>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280]
>>>>>>  15: (()+0x8064) [0x7f100b96d064]
>>>>>>  16: (clone()+0x6d) [0x7f100aa6162d]
>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to interpret this.
>>>>>>
>>>>>> --- logging levels ---
>>>>>>    0/ 5 none
>>>>>>    0/ 0 lockdep
>>>>>>    0/ 0 context
>>>>>>    0/ 0 crush
>>>>>>    0/ 0 mds
>>>>>>    0/ 0 mds_balancer
>>>>>>    0/ 0 mds_locker
>>>>>>    0/ 0 mds_log
>>>>>>    0/ 0 mds_log_expire
>>>>>>    0/ 0 mds_migrator
>>>>>>    0/ 0 buffer
>>>>>>    0/ 0 timer
>>>>>>    0/ 0 filer
>>>>>>    0/ 1 striper
>>>>>>    0/ 0 objecter
>>>>>>    0/ 0 rados
>>>>>>    0/ 0 rbd
>>>>>>    0/ 5 rbd_mirror
>>>>>>    0/ 5 rbd_replay
>>>>>>    0/ 0 journaler
>>>>>>    0/ 0 objectcacher
>>>>>>    0/ 0 client
>>>>>>    0/ 0 osd
>>>>>>    0/ 0 optracker
>>>>>>    0/ 0 objclass
>>>>>>    0/ 0 filestore
>>>>>>    0/ 0 journal
>>>>>>    0/ 0 ms
>>>>>>    0/ 0 mon
>>>>>>    0/ 0 monc
>>>>>>    0/ 0 paxos
>>>>>>    0/ 0 tp
>>>>>>    0/ 0 auth
>>>>>>    1/ 5 crypto
>>>>>>    0/ 0 finisher
>>>>>>    1/ 1 reserver
>>>>>>    0/ 0 heartbeatmap
>>>>>>    0/ 0 perfcounter
>>>>>>    0/ 0 rgw
>>>>>>    1/10 civetweb
>>>>>>    1/ 5 javaclient
>>>>>>    0/ 0 asok
>>>>>>    0/ 0 throttle
>>>>>>    0/ 0 refs
>>>>>>    1/ 5 xio
>>>>>>    1/ 5 compressor
>>>>>>    1/ 5 bluestore
>>>>>>    1/ 5 bluefs
>>>>>>    1/ 3 bdev
>>>>>>    1/ 5 kstore
>>>>>>    4/ 5 rocksdb
>>>>>>    4/ 5 leveldb
>>>>>>    4/ 5 memdb
>>>>>>    1/ 5 kinetic
>>>>>>    1/ 5 fuse
>>>>>>    1/ 5 mgr
>>>>>>    1/ 5 mgrc
>>>>>>    1/ 5 dpdk
>>>>>>    1/ 5 eventtrace
>>>>>>   -2/-2 (syslog threshold)
>>>>>>   -1/-1 (stderr threshold)
>>>>>>   max_recent     10000
>>>>>>   max_new         1000
>>>>>>   log_file /var/log/ceph/ceph-osd.59.log
>>>>>> --- end dump of recent events ---
>>>>>>
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil:
>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hi Sage,
>>>>>>>>
>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is
>>>>>>>> the following and i hope you might have an idea:
>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today
>>>>>>>> running luminous? How can they be missing in snapmapper again?
>>>>>>>
>>>>>>> I'm not sure.  It might help if you can set debug_osd=1/10 (or 1/20) so 
>>>>>>> that when you hit the crash we have some context as to what is going on.
>>>>>>>
>>>>>>> Did you track down which part of get_snaps() is returning the error?
>>>>>>>
>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to
>>>>>>>> luminous when all this happened.
>>>>>>>
>>>>>>> Did this start right after the upgrade?
>>>>>>>
>>>>>>> I started a PR that relaxes some of these assertions so that they clean up 
>>>>>>> instead (unless the debug option is enabled, for our regression testing), 
>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you 
>>>>>>> can provide would help!
>>>>>>>
>>>>>>> Thanks-
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Greets,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil:
>>>>>>>>> I'm guessing it is coming from
>>>>>>>>>
>>>>>>>>>   object_snaps out;
>>>>>>>>>   int r = get_snaps(oid, &out);
>>>>>>>>>   if (r < 0)
>>>>>>>>>     return r;
>>>>>>>>>
>>>>>>>>> in SnapMapper::update_snaps().  I would add a debug statement to to 
>>>>>>>>> confirm that, and possibly also add debug lines to the various return 
>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error.
>>>>>>>>>
>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's 
>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we 
>>>>>>>>> can tolerate it...
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>
>>>>>>>>>> HI Sage,
>>>>>>>>>>
>>>>>>>>>> any ideas for this one sawing while doing a deep scrub:
>>>>>>>>>>
>>>>>>>>>>     -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch:
>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255]
>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f
>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0
>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for
>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN)
>>>>>>>>>>
>>>>>>>>>>      0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1
>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const
>>>>>>>>>> std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread
>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752
>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0)
>>>>>>>>>>
>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>> const*)+0x102) [0x55c0100d0372]
>>>>>>>>>>  2: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35]
>>>>>>>>>>  3: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0]
>>>>>>>>>>  4: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4]
>>>>>>>>>>  5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>> [0x55c00fdf6642]
>>>>>>>>>>  6:
>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>> [0x55c00fdfc684]
>>>>>>>>>>  7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>> [0x55c00fd22030]
>>>>>>>>>>  8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b]
>>>>>>>>>>  9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>> [0x55c00fb1abc7]
>>>>>>>>>>  10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7]
>>>>>>>>>>  11: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c]
>>>>>>>>>>  12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>> [0x55c0100d5ffd]
>>>>>>>>>>  13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0]
>>>>>>>>>>  14: (()+0x8064) [0x7fbd284fb064]
>>>>>>>>>>  15: (clone()+0x6d) [0x7fbd275ef62d]
>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>> needed to interpret this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil:
>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil:
>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it also crashes in (marked with HERE):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> int SnapMapper::get_snaps(
>>>>>>>>>>>>>>   const hobject_t &oid,
>>>>>>>>>>>>>>   object_snaps *out)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   assert(check(oid));
>>>>>>>>>>>>>>   set<string> keys;
>>>>>>>>>>>>>>   map<string, bufferlist> got;
>>>>>>>>>>>>>>   keys.insert(to_object_key(oid));
>>>>>>>>>>>>>>   int r = backend.get_keys(keys, &got);
>>>>>>>>>>>>>>   if (r < 0)
>>>>>>>>>>>>>>     return r;
>>>>>>>>>>>>>>   if (got.empty())
>>>>>>>>>>>>>>     return -ENOENT;
>>>>>>>>>>>>>>   if (out) {
>>>>>>>>>>>>>>     bufferlist::iterator bp = got.begin()->second.begin();
>>>>>>>>>>>>>>     ::decode(*out, bp);
>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " " << out->snaps << dendl;
>>>>>>>>>>>>>>     assert(!out->snaps.empty());            ########### HERE ###########
>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>     dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl;
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is it save to comment that assert?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think so.  All of these asserts are just about ensuring the snapmapper 
>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim.  
>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a 
>>>>>>>>>>>>> whole lot of risk here.  You might want to set nosnaptrim for the time 
>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in.  The next step is 
>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or 
>>>>>>>>>>>>> can't repair.
>>>>>>>>>>>>
>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps.
>>>>>>>>>>>>
>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But
>>>>>>>>>>>> enabling / doing deep-scrubs results into this one:
>>>>>>>>>>>>      0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int
>>>>>>>>>>>> SnapMapper::get_snaps(const h
>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
>>>>>>>>>>>> 2018-01-18 13:00:54.840396
>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())
>>>>>>>>>>>
>>>>>>>>>>> I think if you switch that assert to a warning it will repair...
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2]
>>>>>>>>>>>>  2: (SnapMapper::get_snaps(hobject_t const&,
>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
>>>>>>>>>>>>  3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
>>>>>>>>>>>> 697]
>>>>>>>>>>>>  4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
>>>>>>>>>>>>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
>>>>>>>>>>>>  6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>> [0x5561cdcd7bc7]
>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957]
>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>> [0x5561ce292e7d]
>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
>>>>>>>>>>>>  13: (()+0x8064) [0x7fbd70d86064]
>>>>>>>>>>>>  14: (clone()+0x6d) [0x7fbd6fe7a62d]
>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Greets,
>>>>>>>>>>>> Stefan
>>>>>>>>>>>>
>>>>>>>>>>>>> sage
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil:
>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>> Hi Sage,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l
>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287
>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end())
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is this a cache tiering pool?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded:
>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 3.80e      1709                  0     1579         0       0 6183674368
>>>>>>>>>>>>>>>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>>>>>>>>>>>>>>>        50       [39]             39  907737'69776430 2018-01-14
>>>>>>>>>>>>>>>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then.  Something like this will work 
>>>>>>>>>>>>>>> around the current assert:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644
>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc
>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>>>>>>>>>>>>>>>      set<snapid_t> snaps;
>>>>>>>>>>>>>>>      dout(20) << " snapset " << recovery_info.ss << dendl;
>>>>>>>>>>>>>>>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
>>>>>>>>>>>>>>> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
>>>>>>>>>>>>>>> -    snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>> +    if (p != recovery_info.ss.clone_snaps.end()) {
>>>>>>>>>>>>>>> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>> +      snaps.insert(p->second.begin(), p->second.end());
>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>      dout(20) << " snaps " << snaps << dendl;
>>>>>>>>>>>>>>>      snap_mapper.add_oid(
>>>>>>>>>>>>>>>        recovery_info.soid,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> s
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-94-g92923ef
>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>>>>>>>>>>>>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>>>>>>>>>>>>>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>>>>>>>>>>>>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>>>>>>>>>>>>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>>>>>>>>>>>>>>>> [0x55addb30748f]
>>>>>>>>>>>>>>>>>>  5:
>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>>>>>>>>>>>>>>>> [0x55addb317531]
>>>>>>>>>>>>>>>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>> [0x55addb23cf10]
>>>>>>>>>>>>>>>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>>>>>>>>>>>>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>> [0x55addb035bc7]
>>>>>>>>>>>>>>>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957]
>>>>>>>>>>>>>>>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>>>>>>>>>>>>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d]
>>>>>>>>>>>>>>>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>>>>>>>>>>>>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>>>>>>>>>>>>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Greets,
>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>>>>>>>>>>>>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>>>>>>>>>>>>>>>> list mapping.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>>>>>>>>>>>>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>>>>>>>>>>>>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> sage
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  > 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Stefan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>>>>>>>>>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>>>>>>>>>>>>>>>> being down.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> All of them fail with:
>>>>>>>>>>>>>>>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>>>>>>>>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>>>>>>>>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>>>>>>>>>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>>>>>>>>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>>>>>>>>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>>>>>>>>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>>>>>>>>>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>>>>>>>>>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>>>>>>>>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2]
>>>>>>>>>>>>>>>>>>>>>>  7:
>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4]
>>>>>>>>>>>>>>>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>>>>>>>>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>>>>>>>>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7]
>>>>>>>>>>>>>>>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>>>>>>>>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>>>>>>>>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>>>>>>>>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>>>>>>>>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>>>>>>>>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>>>>>>>>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>>>>>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and
>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>>>>>>>>>>>>>>>> them myself.
>>>>>>>>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html