Re: OSD crashed while reparing inconsistent PG luminous

Ana Aviles <ana@xxxxxxxxxxxx> · Wed, 18 Oct 2017 12:39:59 +0200



    Hello,

    
    We created a BUG #21827 . Also updated the log file of the OSD with
    debug 20. Reference is 6e4dba6f-2c15-4920-b591-fe380bbca200

    
    Thanks,

    Ana

    
    On 18/10/17 00:46, Mart van Santen
      wrote:

    
      Hi Greg,
      (I'm a colleague of Ana), Thank you for your reply
      

      On 10/17/2017 11:57 PM, Gregory
        Farnum wrote:

      
            On Tue, Oct 17, 2017 at 9:51 AM Ana Aviles
              <ana@xxxxxxxxxxxx> wrote:

            
            Hello all,

              
              We had an inconsistent PG on our cluster. While performing
              PG repair

              operation the OSD crashed. The OSD was not able to start
              again anymore,

              and there was no hardware failure on the disk itself. This
              is the log output

              
              2017-10-17 17:48:55.771384 7f234930d700 -1
              log_channel(cluster) log

              [ERR] : 2.2fc repair 1 missing, 0 inconsistent objects

              2017-10-17 17:48:55.771417 7f234930d700 -1
              log_channel(cluster) log

              [ERR] : 2.2fc repair 3 errors, 1 fixed

              2017-10-17 17:48:56.047896 7f234930d700 -1

              /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc: In function
              'virtual void

              PrimaryLogPG::on_local_recover(const hobject_t&, const

              ObjectRecoveryInfo&, ObjectContextRef, bool,
              ObjectStore::Transaction*)'

              thread 7f234930d700 time 2017-10-17 17:48:55.924115

              /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc: 358: FAILED
              assert(p !=

              recovery_info.ss.clone_snaps.end())

            
            Hmm. The OSD got a push op containing a snapshot it
              doesn't think should exist. I also see that there's a
              comment "// hmm, should we warn?" on that assert.
          
        
      We catched also those log entries, which indeed point to a
      clone/snapshot problem:

      
       -9877> 2017-10-17 17:46:16.044077 7f234db16700 10 log_client 
      will send 2017-10-17 17:46:13.367842 osd.78 osd.78
      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 483 : cluster [ERR] 2.2fc
      shard 78 missing
      2:3f72b543:::rbd_data.332d5a836bcc485.000000000000fcf6:466a7

       -9876> 2017-10-17 17:46:16.044105 7f234db16700 10 log_client 
      will send 2017-10-17 17:46:13.368026 osd.78 osd.78
      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 484 : cluster [ERR] repair
      2.2fc 2:3f72b543:::rbd_data.332d5a836bcc485.000000000000fcf6:466a7
      is an unexpected clone

       -9868> 2017-10-17 17:46:16.324112 7f2354b24700 10 log_client 
      logged 2017-10-17 17:46:13.367842 osd.78 osd.78
      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 483 : cluster [ERR] 2.2fc
      shard 78 missing
      2:3f72b543:::rbd_data.332d5a836bcc485.000000000000fcf6:466a7

       -9867> 2017-10-17 17:46:16.324128 7f2354b24700 10 log_client 
      logged 2017-10-17 17:46:13.368026 osd.78 osd.78
      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 484 : cluster [ERR] repair
      2.2fc 2:3f72b543:::rbd_data.332d5a836bcc485.000000000000fcf6:466a7
      is an unexpected clone

         -36> 2017-10-17 17:48:55.771384 7f234930d700 -1
      log_channel(cluster) log [ERR] : 2.2fc repair 1 missing, 0
      inconsistent objects

         -35> 2017-10-17 17:48:55.771417 7f234930d700 -1
      log_channel(cluster) log [ERR] : 2.2fc repair 3 errors, 1 fixed

          -4> 2017-10-17 17:48:56.046071 7f234db16700 10 log_client 
      will send 2017-10-17 17:48:55.771390 osd.78 osd.78
      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 485 : cluster [ERR] 2.2fc
      repair 1 missing, 0 inconsistent objects

          -3> 2017-10-17 17:48:56.046088 7f234db16700 10 log_client 
      will send 2017-10-17 17:48:55.771419 osd.78 osd.78
      [XXXX:XXXX:XXXX:XXXX::203]:6880/9116 486 : cluster [ERR] 2.2fc
      repair 3 errors, 1 fixed

      
            Can you take a full log with "debug osd = 20" set, post
              it with ceph-post-file, and create a ticket on tracker.ceph.com?
          
        
      We will submit the ticket tomorrow (we are in CEST), We want to
      have more pair of eyes on it when we start the OSD again.

      
      After this crash the OSD was marked as out by us. The cluster
      rebalanced itself, unfortunately, the same issue appear on another
      OSD (same pg), after several crashes of this OSD, the OSD came
      back up, but now with one PG down. I assume the cluster decided it
      'finished' the ceph pg repair command and removed the 'repair'
      state, but now with a broken pg. If you have any hints on how we
      can get the PG online again, we would be very grateful, so we can
      work on that tomorrow.

      
      Thanks,

      
      Mart

      
      Some general info about this cluster:

      
      - all OSD runs the same version, also monitors are all 12.2.1
      (ubuntu xenial)

      - the cluster is a backup cluster and has min/size 1 and
        replication 2, so only 2 copies.

      
      - the cluster was recently upgraded from jewel to luminous (3
        weeks ago)
      - the cluster was recently upgraded from straw to straw2 (1
        week ago)
      - it was in HEALTH_OK till this happend.
      - we use filestore only

      
      - the cluster was installed with hammer originally. upgraded to
        infernalis, jewel and now luminous
      

      health:

      (noup/noout set on purpose while we trying to recover)

      
      $ ceph -s

        cluster:

          id:     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx

          health: HEALTH_WARN

                  noup,noout flag(s) set

                  Reduced data availability: 1 pg inactive, 1 pg down

                  Degraded data redundancy: 2892/31621143 objects
      degraded (0.009%), 2 pgs unclean, 1 pg degraded, 1 pg undersized

       
        services:

          mon: 3 daemons, quorum ds2-mon1,ds2-mon2,ds2-mon3

          mgr: ds2-mon1(active)

          osd: 93 osds: 92 up, 92 in; 1 remapped pgs

               flags noup,noout

          rgw: 1 daemon active

       
        data:

          pools:   13 pools, 1488 pgs

          objects: 15255k objects, 43485 GB

          usage:   119 TB used, 126 TB / 245 TB avail

          pgs:     0.067% pgs not active

                   2892/31621143 objects degraded (0.009%)

                   1483 active+clean

                   2    active+clean+scrubbing+deep

                   1    active+undersized+degraded+remapped+backfilling

                   1    active+clean+scrubbing

                   1    down

       
        io:

          client:   340 B/s rd, 14995 B/s wr, 1 op/s rd, 2 op/s wr

          recovery: 9567 kB/s, 2 objects/s

       
      $ ceph health detail

      HEALTH_WARN noup,noout flag(s) set; Reduced data availability: 1
      pg inactive, 1 pg down; Degraded data redundancy: 2774/31621143
      objects degraded (0.009%), 2 pgs unclean, 1 pg degraded, 1 pg
      undersized

      OSDMAP_FLAGS noup,noout flag(s) set

      PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg
      down

          pg 2.2fc is down, acting [69,93]

      PG_DEGRADED Degraded data redundancy: 2774/31621143 objects
      degraded (0.009%), 2 pgs unclean, 1 pg degraded, 1 pg undersized

          pg 2.1e9 is stuck undersized for 23741.295159, current state
      active+undersized+degraded+remapped+backfilling, last acting [41]

          pg 2.2fc is stuck unclean since forever, current state down,
      last acting [69,93]

      
            Are all your OSDs running that same version?
            -Greg
             
             
               ceph version 12.2.1
              (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous

              (stable)

               1: (ceph::__ceph_assert_fail(char const*, char const*,
              int, char

              const*)+0x102) [0x56236c8ff3f2]

               2: (PrimaryLogPG::on_local_recover(hobject_t const&,
              ObjectRecoveryInfo

              const&, std::shared_ptr<ObjectContext>, bool,

              ObjectStore::Transaction*)+0xd63) [0x56236c476213]

               3: (ReplicatedBackend::handle_pull_response(pg_shard_t,
              PushOp const&,

              PullOp*,
              std::__cxx11::list<ReplicatedBackend::pull_complete_info,

std::allocator<ReplicatedBackend::pull_complete_info> >*,

              ObjectStore::Transaction*)+0x693) [0x56236c60d4d3]

               4:

(ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x2b5)

              [0x56236c60dd75]

               5:

(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x20c)

              [0x56236c61196c]

               6:
              (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)

              [0x56236c521aa0]

               7:
              (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,

              ThreadPool::TPHandle&)+0x55d) [0x56236c48662d]

               8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,

              boost::intrusive_ptr<OpRequest>,
              ThreadPool::TPHandle&)+0x3a9)

              [0x56236c3091a9]

               9:
              (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>

              const&)+0x57) [0x56236c5a2ae7]

               10: (OSD::ShardedOpWQ::_process(unsigned int,

              ceph::heartbeat_handle_d*)+0x130e) [0x56236c3307de]

               11: (ShardedThreadPool::shardedthreadpool_worker(unsigned
              int)+0x884)

              [0x56236c9041e4]

               12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
              [0x56236c907220]

               13: (()+0x76ba) [0x7f2366be96ba]

               14: (clone()+0x6d) [0x7f2365c603dd]

               NOTE: a copy of the executable, or `objdump -rdS
              <executable>` is

              needed to interpret this.

              
              Thanks!

              
              Ana

              
              _______________________________________________

              ceph-users mailing list

              ceph-users@xxxxxxxxxxxxxx

              http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

            
        _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

      
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com