Re: OSD crashed while reparing inconsistent PG luminous

Ana Aviles <ana@xxxxxxxxxxxx> · Thu, 19 Oct 2017 12:26:21 +0200

    Hi Greg,
    Thanks for your findings! We've updated the issue with the log
      files of osd.93 and osd.69 which corresponds to the period of the
      log we posted. Also, we've recreated a new set of logs for that
      pair of OSDs. As we explain in the issue, right now the OSDs fail
      on that other assert you mentioned, and, sadly enough we can't
      reproduce the initial crash.
    Greetings,

    Ana

    On 19/10/17 01:14, Gregory Farnum
      wrote:

      I updated the ticket with some findings. It appears
        that osd.93 has that snapshot object in its missing set that
        gets sent to osd.78, and then osd.69 claims to have the object.
        Can you upload debug logs of those OSDs that go along with this
        log? (Or just generate a new set of them together.)
        -Greg

              On Wed, Oct 18, 2017 at 9:16 AM Mart van
                Santen <mart@xxxxxxxxxxxx>
                wrote:

                  Dear all,
                  We are still struggling this this issue. By now,
                    one OSD crashes all the time (a different then
                    yesterday), but now on a different assert. 

                  Namely with this one:
                  #0  0x00007ffff5464428 in __GI_raise
                    (sig=sig@entry=6) at
                    ../sysdeps/unix/sysv/linux/raise.c:54

                    #1  0x00007ffff546602a in __GI_abort () at
                    abort.c:89

                    #2  0x0000555555ff157e in ceph::__ceph_assert_fail
                    (assertion=assertion@entry=0x555556564e97
                    "head_obc", file=file@entry=0x555556566bd8
                    "/build/ceph-12.2.1/src/osd/PrimaryLogPG.cc",
                    line=line@entry=10369, 

                        func=func@entry=0x55555656d940
                    <PrimaryLogPG::recover_missing(hobject_t
                    const&, eversion_t, int,
                    PGBackend::RecoveryHandle*)::__PRETTY_FUNCTION__>
                    "int PrimaryLogPG::recover_missing(const
                    hobject_t&, eversion_t, int,
                    PGBackend::RecoveryHandle*)") at
                    /build/ceph-12.2.1/src/common/assert.cc:66

                    #3  0x0000555555b833e9 in
                    PrimaryLogPG::recover_missing
                    (this=this@entry=0x555562aca000, soid=..., v=...,
                    priority=<optimized out>,
                    h=h@entry=0x555567a3a080) at
                    /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc:10369

                    #4  0x0000555555bc3fd0 in
                    PrimaryLogPG::recover_primary
                    (this=this@entry=0x555562aca000, max=max@entry=1,
                    handle=...) at
                    /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc:11588

                    #5  0x0000555555bcc81e in
                    PrimaryLogPG::start_recovery_ops
                    (this=0x555562aca000, max=1, handle=...,
                    ops_started=0x7fffd8b1ac68) at
                    /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc:11339

                    #6  0x0000555555a20a59 in OSD::do_recovery
                    (this=0x55555f95a000, pg=0x555562aca000,
                    queued=384619, reserved_pushes=1, handle=...) at
                    /build/ceph-12.2.1/src/osd/OSD.cc:9381

                    #7  0x0000555555c94be9 in
                    PGQueueable::RunVis::operator()
                    (this=this@entry=0x7fffd8b1af00, op=...) at
                    /build/ceph-12.2.1/src/osd/PGQueueable.cc:34

                    #8  0x0000555555a226c4 in
boost::detail::variant::invoke_visitor<PGQueueable::RunVis>::internal_visit<PGRecovery>
                    (operand=..., this=<synthetic pointer>)

                        at
/build/ceph-12.2.1/obj-x86_64-linux-gnu/boost/include/boost/variant/variant.hpp:1046

                    #9 
boost::detail::variant::visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>,
                    void*, PGRecovery> (storage=0x7fffd8b1af50,
                    visitor=<synthetic pointer>)

                        at
/build/ceph-12.2.1/obj-x86_64-linux-gnu/boost/include/boost/variant/detail/visitation_impl.hpp:114

                    #10
boost::detail::variant::visitation_impl_invoke<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>,
                    void*, PGRecovery,
                    boost::variant<boost::intrusive_ptr<OpRequest>,
                    PGSnapTrim, PGScrub,
                    PGRecovery>::has_fallback_type_> (

                        t=0x0, storage=0x7fffd8b1af50,
                    visitor=<synthetic pointer>,
                    internal_which=<optimized out>) at
/build/ceph-12.2.1/obj-x86_64-linux-gnu/boost/include/boost/variant/detail/visitation_impl.hpp:157

                    #11
                    boost::detail::variant::visitation_impl<mpl_::int_<0>,
boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<4l>,
                    boost::intrusive_ptr<OpRequest>,
                    boost::mpl::l_item<mpl_::long_<3l>,
                    PGSnapTrim,
                    boost::mpl::l_item<mpl_::long_<2l>,
                    PGScrub,
                    boost::mpl::l_item<mpl_::long_<1l>,
                    PGRecovery, boost::mpl::l_end> > > >
                    >, boost::mpl::l_iter<boost::mpl::l_end>
                    >,
                    boost::detail::variant::invoke_visitor<PGQueueable::RunVis>,
                    void*,
                    boost::variant<boost::intrusive_ptr<OpRequest>,
                    PGSnapTrim, PGScrub,
                    PGRecovery>::has_fallback_type_>
                    (no_backup_flag=..., storage=0x7fffd8b1af50,
                    visitor=<synthetic pointer>,
                    logical_which=<optimized out>,
                    internal_which=<optimized out>)

                        at
/build/ceph-12.2.1/obj-x86_64-linux-gnu/boost/include/boost/variant/detail/visitation_impl.hpp:238

                    #12
                    boost::variant<boost::intrusive_ptr<OpRequest>,
                    PGSnapTrim, PGScrub,
PGRecovery>::internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>,
                    void*> (storage=0x7fffd8b1af50,
                    visitor=<synthetic pointer>, 

                        logical_which=<optimized out>,
                    internal_which=<optimized out>) at
/build/ceph-12.2.1/obj-x86_64-linux-gnu/boost/include/boost/variant/variant.hpp:2389

                    #13
                    boost::variant<boost::intrusive_ptr<OpRequest>,
                    PGSnapTrim, PGScrub,
PGRecovery>::internal_apply_visitor<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>
                    > (visitor=<synthetic pointer>,
                    this=0x7fffd8b1af48)

                        at
/build/ceph-12.2.1/obj-x86_64-linux-gnu/boost/include/boost/variant/variant.hpp:2400

                    #14
                    boost::variant<boost::intrusive_ptr<OpRequest>,
                    PGSnapTrim, PGScrub,
                    PGRecovery>::apply_visitor<PGQueueable::RunVis>
                    (visitor=..., this=0x7fffd8b1af48)

                        at
/build/ceph-12.2.1/obj-x86_64-linux-gnu/boost/include/boost/variant/variant.hpp:2423

                    #15 boost::apply_visitor<PGQueueable::RunVis,
                    boost::variant<boost::intrusive_ptr<OpRequest>,
                    PGSnapTrim, PGScrub, PGRecovery> >
                    (visitable=..., visitor=...)

                        at
/build/ceph-12.2.1/obj-x86_64-linux-gnu/boost/include/boost/variant/detail/apply_visitor_unary.hpp:70

                    #16 PGQueueable::run (handle=..., pg=...,
                    osd=<optimized out>, this=0x7fffd8b1af48) at
                    /build/ceph-12.2.1/src/osd/PGQueueable.h:140

                    #17 OSD::ShardedOpWQ::_process (this=0x55555f95b5d8,
                    thread_index=<optimized out>,
                    hb=0x555567a1e1e0) at
                    /build/ceph-12.2.1/src/osd/OSD.cc:10327

                    #18 0x0000555555ff61e4 in
                    ShardedThreadPool::shardedthreadpool_worker
                    (this=0x55555f95ab88, thread_index=4) at
                    /build/ceph-12.2.1/src/common/WorkQueue.cc:339

                    #19 0x0000555555ff9220 in
                    ShardedThreadPool::WorkThreadSharded::entry
                    (this=<optimized out>) at
                    /build/ceph-12.2.1/src/common/WorkQueue.h:689

                    #20 0x00007ffff64bf6ba in start_thread
                    (arg=0x7fffd8b1d700) at pthread_create.c:333

                    #21 0x00007ffff55363dd in clone () at
                    ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

                  We are not gdb experts, however, 

                  When we request the arguments of the 3th frame, it
                    tells us, 

                  #3  0x0000555555b833e9 in
                    PrimaryLogPG::recover_missing
                    (this=this@entry=0x555562aca000, soid=..., v=...,
                    priority=<optimized out>,
                    h=h@entry=0x555567a3a080) at
                    /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc:10369

                    10369    /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc:
                    No such file or directory.

                    (gdb) info args

                    this = 0x555562aca000

                    soid = @0x7fffd8b1a6c0: {oid = {name =
                    "rbd_data.30732d3238f3535.", '0' <repeats 12
                    times>, "f7e3"}, snap = {val = 244802}, hash =
                    3610026748, max = false, nibblewise_key_cache =
                    3475686013, hash_reverse_bits = 1061500139, 

                      static POOL_META = -1, static POOL_TEMP_START =
                    -2, pool = 2, nspace = "", key = ""}

                    v = <optimized out>

                    priority = <optimized out>

                    h = 0x555567a3a080

                    (gdb) Quit

                  So it points to:
                    rbd_data.30732d3238f3535.(0x12)f7e3 

                  This is weird, as we cannot found any reference to
                    "rbd_data.30732d3238f3535.000000000000f7e3", this
                    rbd prefix does not exist on our system (when
                    running rbd ls and requesting info of the
                    rbds,30732d3238f3535 is not found ), nor we can find
                    any object on the filesystem which looks to belong
                    to this rbd image. It can of course be an old rbd
                    image which is deleted by now, but we do not
                    understand why it is still trying to access it or
                    repair it. It seems this repair is in some queue or
                    something.

                  This is happening in the same PG as the problem we
                    found yesterday.

                  All advise is welcome, as we still have one pg
                    down..

                  best,

                  mart

                  On
                    10/18/2017 12:39 PM, Ana Aviles wrote:

                    Hello,

                    We created a BUG #21827 . Also updated the log file
                    of the OSD with debug 20. Reference is
                    6e4dba6f-2c15-4920-b591-fe380bbca200

                    Thanks,

                    Ana

                    On
                      18/10/17 00:46, Mart van Santen wrote:

                      Hi Greg,
                      (I'm a colleague of Ana), Thank you for your
                        reply

                      On
                        10/17/2017 11:57 PM, Gregory Farnum wrote:

                            On Tue, Oct 17, 2017 at 9:51
                              AM Ana Aviles <ana@xxxxxxxxxxxx>
                              wrote:

                            Hello all,

                              We had an inconsistent PG on our cluster.
                              While performing PG repair

                              operation the OSD crashed. The OSD was not
                              able to start again anymore,

                              and there was no hardware failure on the
                              disk itself. This is the log output

                              2017-10-17 17:48:55.771384 7f234930d700 -1
                              log_channel(cluster) log

                              [ERR] : 2.2fc repair 1 missing, 0
                              inconsistent objects

                              2017-10-17 17:48:55.771417 7f234930d700 -1
                              log_channel(cluster) log

                              [ERR] : 2.2fc repair 3 errors, 1 fixed

                              2017-10-17 17:48:56.047896 7f234930d700 -1

/build/ceph-12.2.1/src/osd/PrimaryLogPG.cc: In function 'virtual void

                              PrimaryLogPG::on_local_recover(const
                              hobject_t&, const

                              ObjectRecoveryInfo&, ObjectContextRef,
                              bool, ObjectStore::Transaction*)'

                              thread 7f234930d700 time 2017-10-17
                              17:48:55.924115

/build/ceph-12.2.1/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=

                              recovery_info.ss.clone_snaps.end())

                            Hmm. The OSD got a push op containing a
                              snapshot it doesn't think should exist. I
                              also see that there's a comment "// hmm,
                              should we warn?" on that assert.

                      We catched also those log entries, which indeed
                      point to a clone/snapshot problem:

                       -9877> 2017-10-17 17:46:16.044077 7f234db16700
                      10 log_client  will send 2017-10-17
                      17:46:13.367842 osd.78 osd.78
                      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 483 : cluster
                      [ERR] 2.2fc shard 78 missing
                      2:3f72b543:::rbd_data.332d5a836bcc485.000000000000fcf6:466a7

                       -9876> 2017-10-17 17:46:16.044105 7f234db16700
                      10 log_client  will send 2017-10-17
                      17:46:13.368026 osd.78 osd.78
                      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 484 : cluster
                      [ERR] repair 2.2fc
                      2:3f72b543:::rbd_data.332d5a836bcc485.000000000000fcf6:466a7
                      is an unexpected clone

                       -9868> 2017-10-17 17:46:16.324112 7f2354b24700
                      10 log_client  logged 2017-10-17 17:46:13.367842
                      osd.78 osd.78 [XXXX:XXXX:XXXX:XXXX::203]:6880/9116
                      483 : cluster [ERR] 2.2fc shard 78 missing
                      2:3f72b543:::rbd_data.332d5a836bcc485.000000000000fcf6:466a7

                       -9867> 2017-10-17 17:46:16.324128 7f2354b24700
                      10 log_client  logged 2017-10-17 17:46:13.368026
                      osd.78 osd.78 [XXXX:XXXX:XXXX:XXXX::203]:6880/9116
                      484 : cluster [ERR] repair 2.2fc
                      2:3f72b543:::rbd_data.332d5a836bcc485.000000000000fcf6:466a7
                      is an unexpected clone

                         -36> 2017-10-17 17:48:55.771384 7f234930d700
                      -1 log_channel(cluster) log [ERR] : 2.2fc repair 1
                      missing, 0 inconsistent objects

                         -35> 2017-10-17 17:48:55.771417 7f234930d700
                      -1 log_channel(cluster) log [ERR] : 2.2fc repair 3
                      errors, 1 fixed

                          -4> 2017-10-17 17:48:56.046071 7f234db16700
                      10 log_client  will send 2017-10-17
                      17:48:55.771390 osd.78 osd.78
                      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 485 : cluster
                      [ERR] 2.2fc repair 1 missing, 0 inconsistent
                      objects

                          -3> 2017-10-17 17:48:56.046088 7f234db16700
                      10 log_client  will send 2017-10-17
                      17:48:55.771419 osd.78 osd.78
                      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 486 : cluster
                      [ERR] 2.2fc repair 3 errors, 1 fixed

                            Can you take a full log with "debug osd
                              = 20" set, post it with ceph-post-file,
                              and create a ticket on tracker.ceph.com?

                      We will submit the ticket tomorrow (we are in
                      CEST), We want to have more pair of eyes on it
                      when we start the OSD again.

                      After this crash the OSD was marked as out by us.
                      The cluster rebalanced itself, unfortunately, the
                      same issue appear on another OSD (same pg), after
                      several crashes of this OSD, the OSD came back up,
                      but now with one PG down. I assume the cluster
                      decided it 'finished' the ceph pg repair command
                      and removed the 'repair' state, but now with a
                      broken pg. If you have any hints on how we can get
                      the PG online again, we would be very grateful, so
                      we can work on that tomorrow.

                      Thanks,

                      Mart

                      Some general info about this cluster:

                      - all OSD runs the same version, also monitors are
                      all 12.2.1 (ubuntu xenial)

                      - the cluster is a backup cluster and has
                        min/size 1 and replication 2, so only 2 copies.

                      - the cluster was recently upgraded from jewel
                        to luminous (3 weeks ago)
                      - the cluster was recently upgraded from straw
                        to straw2 (1 week ago)
                      - it was in HEALTH_OK till this happend.
                      - we use filestore only

                      - the cluster was installed with hammer
                        originally. upgraded to infernalis, jewel and
                        now luminous

                      health:

                      (noup/noout set on purpose while we trying to
                      recover)

                      $ ceph -s

                        cluster:

                          id:     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx

                          health: HEALTH_WARN

                                  noup,noout flag(s) set

                                  Reduced data availability: 1 pg
                      inactive, 1 pg down

                                  Degraded data redundancy:
                      2892/31621143 objects degraded (0.009%), 2 pgs
                      unclean, 1 pg degraded, 1 pg undersized

                        services:

                          mon: 3 daemons, quorum
                      ds2-mon1,ds2-mon2,ds2-mon3

                          mgr: ds2-mon1(active)

                          osd: 93 osds: 92 up, 92 in; 1 remapped pgs

                               flags noup,noout

                          rgw: 1 daemon active

                        data:

                          pools:   13 pools, 1488 pgs

                          objects: 15255k objects, 43485 GB

                          usage:   119 TB used, 126 TB / 245 TB avail

                          pgs:     0.067% pgs not active

                                   2892/31621143 objects degraded
                      (0.009%)

                                   1483 active+clean

                                   2    active+clean+scrubbing+deep

                                   1   
                      active+undersized+degraded+remapped+backfilling

                                   1    active+clean+scrubbing

                                   1    down

                        io:

                          client:   340 B/s rd, 14995 B/s wr, 1 op/s rd,
                      2 op/s wr

                          recovery: 9567 kB/s, 2 objects/s

                      $ ceph health detail

                      HEALTH_WARN noup,noout flag(s) set; Reduced data
                      availability: 1 pg inactive, 1 pg down; Degraded
                      data redundancy: 2774/31621143 objects degraded
                      (0.009%), 2 pgs unclean, 1 pg degraded, 1 pg
                      undersized

                      OSDMAP_FLAGS noup,noout flag(s) set

                      PG_AVAILABILITY Reduced data availability: 1 pg
                      inactive, 1 pg down

                          pg 2.2fc is down, acting [69,93]

                      PG_DEGRADED Degraded data redundancy:
                      2774/31621143 objects degraded (0.009%), 2 pgs
                      unclean, 1 pg degraded, 1 pg undersized

                          pg 2.1e9 is stuck undersized for 23741.295159,
                      current state
                      active+undersized+degraded+remapped+backfilling,
                      last acting [41]

                          pg 2.2fc is stuck unclean since forever,
                      current state down, last acting [69,93]

                            Are all your OSDs running that same
                              version?
                            -Greg

                               ceph version 12.2.1
                              (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
                              luminous

                              (stable)

                               1: (ceph::__ceph_assert_fail(char const*,
                              char const*, int, char

                              const*)+0x102) [0x56236c8ff3f2]

                               2:
                              (PrimaryLogPG::on_local_recover(hobject_t
                              const&, ObjectRecoveryInfo

                              const&,
                              std::shared_ptr<ObjectContext>,
                              bool,

                              ObjectStore::Transaction*)+0xd63)
                              [0x56236c476213]

                               3:
                              (ReplicatedBackend::handle_pull_response(pg_shard_t,
                              PushOp const&,

                              PullOp*,
                              std::__cxx11::list<ReplicatedBackend::pull_complete_info,

std::allocator<ReplicatedBackend::pull_complete_info> >*,

                              ObjectStore::Transaction*)+0x693)
                              [0x56236c60d4d3]

                               4:

(ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x2b5)

                              [0x56236c60dd75]

                               5:

(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x20c)

                              [0x56236c61196c]

                               6:
                              (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)

                              [0x56236c521aa0]

                               7:
                              (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,

                              ThreadPool::TPHandle&)+0x55d)
                              [0x56236c48662d]

                               8:
                              (OSD::dequeue_op(boost::intrusive_ptr<PG>,

                              boost::intrusive_ptr<OpRequest>,
                              ThreadPool::TPHandle&)+0x3a9)

                              [0x56236c3091a9]

                               9:
                              (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>

                              const&)+0x57) [0x56236c5a2ae7]

                               10: (OSD::ShardedOpWQ::_process(unsigned
                              int,

                              ceph::heartbeat_handle_d*)+0x130e)
                              [0x56236c3307de]

                               11:
                              (ShardedThreadPool::shardedthreadpool_worker(unsigned
                              int)+0x884)

                              [0x56236c9041e4]

                               12:
                              (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
                              [0x56236c907220]

                               13: (()+0x76ba) [0x7f2366be96ba]

                               14: (clone()+0x6d) [0x7f2365c603dd]

                               NOTE: a copy of the executable, or
                              `objdump -rdS <executable>` is

                              needed to interpret this.

                              Thanks!

                              Ana

_______________________________________________

                              ceph-users mailing list

                              ceph-users@xxxxxxxxxxxxxx

                              http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                        _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                    _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                  -- 
Mart van Santen
Greenhost
E: mart@xxxxxxxxxxxx
T: +31 20 4890444
W: https://greenhost.nl

A PGP signature can be attached to this e-mail,
you need PGP software to verify it. 
My public key is available in keyserver(s)

PGP Fingerprint: CA85 EB11 2B70 042D AF66  B29A 6437 01A1 10A3 D3A5

                _______________________________________________

                ceph-users mailing list

                ceph-users@xxxxxxxxxxxxxx

                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com