Re: Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Thu, 18 Jan 2018 13:02:32 +0100



Hi,

OK the cluster has now fully recovered the only problem is now that i
can't enable scrubs as they fail with:
     0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1
/build/ceph/src/osd/SnapMapper.cc: In function 'int
SnapMapper::get_snaps(const h
object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time
2018-01-18 13:00:54.840396
/build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty())

 ceph version 12.2.2-94-g92923ef
(92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x5561ce28d1f2]
 2: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b]
 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> >*)+0xd7) [0x5561cdef7
697]
 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22]
 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f]
 6: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>,
ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584]
 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
[0x5561cdcd7bc7]
 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x57) [0x5561cdf4f957]
 10: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
[0x5561ce292e7d]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40]
 13: (()+0x8064) [0x7fbd70d86064]
 14: (clone()+0x6d) [0x7fbd6fe7a62d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.


Greets,
Stefan

Am 17.01.2018 um 19:56 schrieb Sage Weil:
> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>
>> Am 17.01.2018 um 19:48 schrieb Sage Weil:
>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hi Sage,
>>>>
>>>> this gives me another crash while that pg is recovering:
>>>>
>>>>      0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void
>>>> PrimaryLogPG::on_l
>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&,
>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti
>>>> me 2018-01-17 19:25:09.322287
>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
>>>> recovery_info.ss.clone_snaps.end())
>>>
>>> Is this a cache tiering pool?
>>
>> no normal 3 replica but the pg is degraded:
>> ceph pg dump | grep 3.80e
>>
>> 3.80e      1709                  0     1579         0       0 6183674368
>> 10014    10014 active+undersized+degraded+remapped+backfill_wait
>> 2018-01-17 19:42:55.840884  918403'70041375  918403:77171331 [50,54,86]
>>        50       [39]             39  907737'69776430 2018-01-14
>> 22:19:54.110864  907737'69776430 2018-01-14 22:19:54.110864
> 
> Hrm, no real clues on teh root cause then.  Something like this will work 
> around the current assert:
> 
> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
> index d42f3a401b..0f76134f74 100644
> --- a/src/osd/PrimaryLogPG.cc
> +++ b/src/osd/PrimaryLogPG.cc
> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover(
>      set<snapid_t> snaps;
>      dout(20) << " snapset " << recovery_info.ss << dendl;
>      auto p = recovery_info.ss.clone_snaps.find(hoid.snap);
> -    assert(p != recovery_info.ss.clone_snaps.end());  // hmm, should we warn?
> -    snaps.insert(p->second.begin(), p->second.end());
> +    if (p != recovery_info.ss.clone_snaps.end()) {
> +      derr << __func__ << " no clone_snaps for " << hoid << dendl;
> +    } else {
> +      snaps.insert(p->second.begin(), p->second.end());
> +    }
>      dout(20) << " snaps " << snaps << dendl;
>      snap_mapper.add_oid(
>        recovery_info.soid,
> 
> 
>>
>> Stefan
>>
>>>
>>> s
>>>>
>>>>  ceph version 12.2.2-94-g92923ef
>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x102) [0x55addb5eb1f2]
>>>>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo
>>>> const&, std::shared_ptr<ObjectContext>, bool,
>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0]
>>>>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed]
>>>>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x18f)
>>>> [0x55addb30748f]
>>>>  5:
>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2d1)
>>>> [0x55addb317531]
>>>>  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>> [0x55addb23cf10]
>>>>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb]
>>>>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>> [0x55addb035bc7]
>>>>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>> const&)+0x57) [0x55addb2ad957]
>>>>  10: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c]
>>>>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>> [0x55addb5f0e7d]
>>>>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40]
>>>>  13: (()+0x8064) [0x7f4955b68064]
>>>>  14: (clone()+0x6d) [0x7f4954c5c62d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil:
>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi,
>>>>>>
>>>>>> i there any chance to fix this instead of removing manually all the clones?
>>>>>
>>>>> I believe you can avoid the immediate problem and get the PG up by 
>>>>> commenting out the assert.  set_snaps() will overwrite the object->snap 
>>>>> list mapping.
>>>>>
>>>>> The problem is you'll probably still a stray snapid -> object mapping, so 
>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error 
>>>>> state that won't trim (although from a quick look at the code it won't 
>>>>> crash).  I'd probably remove the assert and deal with that if/when it 
>>>>> happens.
>>>>>
>>>>> I'm adding a ticket to relax these asserts for production but keep them 
>>>>> enabled for qa.  This isn't something that needs to take down the OSD!
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>>  > 
>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum:
>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG
>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds
>>>>>>>> being down.
>>>>>>>>
>>>>>>>> All of them fail with:
>>>>>>>>     0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void
>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
>>>>>>>> MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)
>>>>>>>>
>>>>>>>>  ceph version 12.2.2-93-gd6da8d7
>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
>>>>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>>> const*)+0x102) [0x561f9ff0b1e2]
>>>>>>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
>>>>>>>> std::less<snapid_t>, std::allocator<snapid_t> > const&,
>>>>>>>> MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
>>>>>>>> [0x561f9fb76f3b]
>>>>>>>>  3: (PG::update_snap_map(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
>>>>>>>>  4: (PG::append_log(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
>>>>>>>>  5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
>>>>>>>> std::allocator<pg_log_entry_t> > const&,
>>>>>>>> boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
>>>>>>>>  6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
>>>>>>>> [0x561f9fc314b2]
>>>>>>>>  7:
>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
>>>>>>>> [0x561f9fc374f4]
>>>>>>>>  8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>>>>>>> [0x561f9fb5cf10]
>>>>>>>>  9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb]
>>>>>>>>  10: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>>>>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
>>>>>>>> [0x561f9f955bc7]
>>>>>>>>  11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>>>>>>> const&)+0x57) [0x561f9fbcd947]
>>>>>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int,
>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c]
>>>>>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
>>>>>>>> [0x561f9ff10e6d]
>>>>>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30]
>>>>>>>>  15: (()+0x8064) [0x7f949afcb064]
>>>>>>>>  16: (clone()+0x6d) [0x7f949a0bf62d]
>>>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to interpret this.
>>>>>>>
>>>>>>> By the time it gets there, something else has gone wrong. The OSD is
>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that
>>>>>>> there are already entries (which it thinks there shouldn't be).
>>>>>>>
>>>>>>> You'll need to post more of a log, along with background, if anybody's
>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is
>>>>>>> this pool used for? Were there other errors on this PG in the past?
>>>>>>>
>>>>>>> I also notice a separate email about deleting the data; I don't have
>>>>>>> any experience with this but you'd probably have to export the PG
>>>>>>> using ceph-objectstore-tool and then find a way to delete the object
>>>>>>> out of it. I see options to remove both an object and
>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of
>>>>>>> them myself.
>>>>>>> -Greg
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html