Re: ceph inconsistent pg missing ec object

Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> · Thu, 9 Nov 2017 10:17:00 +0100



    Hi Greg,
    Thanks! This seems to have worked for at least 1 of 2
      inconsistent pgs: The inconsistency disappeared after a new scrub.
      Still waiting for the result of the second pg. I tried to force
      deep-scrub with `ceph pg deep-scrub <pg>` yesterday, but
      today the last deep scrub is still from a week ago. Is there a way
      to actually deep-scrub immediately?
    Thanks again!
    Kenneth

    
    On 02/11/17 19:27, Gregory Farnum
      wrote:

    
      Okay, after consulting with a colleague this
        appears to be an instance of http://tracker.ceph.com/issues/21382.
        Assuming the object is one that doesn't have snapshots, your
        easiest resolution is to use rados get to retrieve the object
        (which, unlike recovery, should work) and then "rados put" it
        back in to place.
        

        This fix might be backported to Jewel for a later release,
          but it's tricky so wasn't done proactively.
        -Greg

          
            On Fri, Oct 20, 2017 at 12:27 AM Stijn De
              Weirdt <stijn.deweirdt@xxxxxxxx>
              wrote:

            
            hi
              gregory,

              
              we more or less followed the instructions on the site
              (famous last

              words, i know ;)

              
              grepping for the error in the osd logs of the osds of the
              pg, the

              primary logs had "5.5e3s0 shard 59(5) missing

              5:c7ae919b:::10014d3184b.00000000:head"

              
              we looked for the object using the find command, we got

              
              > [root@osd003 ~]# find
              /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/ -name
              "*10014d3184b.00000000*"

              >

              >
/var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/DIR_3/DIR_E/DIR_5/DIR_7/DIR_9/10014d3184b.00000000__head_D98975E3__5_ffffffffffffffff_0

              
              then we ran this find on all 11 osds from the pg, and 10
              out of 11 osds

              gave similar path (the suffix _[0-9a] matched the index of
              the osd in

              the list of osds reported by the pg, so i assumed that was
              the ec

              splitting up the data in 11 pieces)

              
              on one osd in the list of osds, there was no such object
              (the 6th one,

              index 5, so more assuming form our side that this was the
              5 in 5:...

              from the logfile). so we assumed this was the missing
              object that the

              error reported. we have absolutely no clue why it was
              missing or what

              happened, nothing in any logs.

              
              what we did then was stop the osd that had the missing
              object, flush the

              journal and start the osd and ran repair. (the guide
              mentioned to delete

              an object, we did not delete anything, because we assumed
              the issue was

              the already missing object from the 6th osd)

              
              flushing the journal segfaulted, but the osd started fine
              again.

              
              the scrub errors did not disappear, so we did the same
              again on the

              primary (no deleting of anything; and again, the flush
              segfaulted).

              
              wrt the segfault, i attached the output of a segfaulting
              flush with

              debug on another osd.

              
              stijn

              
              On 10/20/2017 02:56 AM, Gregory Farnum wrote:

              > Okay, you're going to need to explain in very clear
              terms exactly what

              > happened to your cluster, and *exactly* what
              operations you performed

              > manually.

              >

              > The PG shards seem to have different views of the PG
              in question. The

              > primary has a different log_tail, last_user_version,
              and last_epoch_clean

              > from the others. Plus different log sizes? It's not
              making a ton of sense

              > at first glance.

              > -Greg

              >

              > On Thu, Oct 19, 2017 at 1:08 AM Stijn De Weirdt <stijn.deweirdt@xxxxxxxx>

              > wrote:

              >

              >> hi greg,

              >>

              >> i attached the gzip output of the query and some
              more info below. if you

              >> need more, let me know.

              >>

              >> stijn

              >>

              >>> [root@mds01 ~]# ceph -s

              >>>     cluster
              92beef0a-1239-4000-bacf-4453ab630e47

              >>>      health HEALTH_ERR

              >>>             1 pgs inconsistent

              >>>             40 requests are blocked > 512
              sec

              >>>             1 scrub errors

              >>>             mds0: Behind on trimming
              (2793/30)

              >>>      monmap e1: 3 mons at {mds01=

              >> 1.2.3.4:6789/0,mds02=1.2.3.5:6789/0,mds03=1.2.3.6:6789/0}

              >>>             election epoch 326, quorum 0,1,2
              mds01,mds02,mds03

              >>>       fsmap e238677: 1/1/1 up
              {0=mds02=up:active}, 2 up:standby

              >>>      osdmap e79554: 156 osds: 156 up, 156 in

              >>>             flags
              sortbitwise,require_jewel_osds

              >>>       pgmap v51003893: 4096 pgs, 3 pools, 387
              TB data, 243 Mobjects

              >>>             545 TB used, 329 TB / 874 TB
              avail

              >>>                 4091 active+clean

              >>>                    4
              active+clean+scrubbing+deep

              >>>                    1
              active+clean+inconsistent

              >>>   client io 284 kB/s rd, 146 MB/s wr, 145
              op/s rd, 177 op/s wr

              >>>   cache io 115 MB/s flush, 153 MB/s evict, 14
              op/s promote, 3 PG(s)

              >> flushing

              >>

              >>> [root@mds01 ~]# ceph health detail

              >>> HEALTH_ERR 1 pgs inconsistent; 52 requests
              are blocked > 512 sec; 5 osds

              >> have slow requests; 1 scrub errors; mds0: Behind
              on trimming (2782/30)

              >>> pg 5.5e3 is active+clean+inconsistent, acting

              >> [35,50,91,18,139,59,124,40,104,12,71]

              >>> 34 ops are blocked > 524.288 sec on osd.8

              >>> 6 ops are blocked > 524.288 sec on osd.67

              >>> 6 ops are blocked > 524.288 sec on osd.27

              >>> 1 ops are blocked > 524.288 sec on osd.107

              >>> 5 ops are blocked > 524.288 sec on osd.116

              >>> 5 osds have slow requests

              >>> 1 scrub errors

              >>> mds0: Behind on trimming
              (2782/30)(max_segments: 30, num_segments: 2782)

              >>

              >>> # zgrep -C 1 ERR ceph-osd.35.log.*.gz

              >>> ceph-osd.35.log.5.gz:2017-10-14
              11:25:52.260668 7f34d6748700  0 --

              >> 10.141.16.13:6801/1001792
              >> 1.2.3.11:6803/1951
              pipe(0x56412da80800

              >> sd=273 :6801 s=2 pgs=3176 cs=31 l=0
              c=0x564156e83b00).fault with nothing to

              >> send, going to standby

              >>> ceph-osd.35.log.5.gz:2017-10-14
              11:26:06.071011 7f3511be4700 -1

              >> log_channel(cluster) log [ERR] : 5.5e3s0 shard
              59(5) missing

              >> 5:c7ae919b:::10014d3184b.00000000:head

              >>> ceph-osd.35.log.5.gz:2017-10-14
              11:28:36.465684 7f34ffdf5700  0 --

              >> 1.2.3.13:6801/1001792
              >> 1.2.3.21:6829/1834
              pipe(0x56414e2a2000 sd=37

              >> :6801 s=0 pgs=0 cs=0 l=0 c=0x5641470d2a00).accept
              connect_seq 33 vs

              >> existing 33 state standby

              >>> ceph-osd.35.log.5.gz:--

              >>> ceph-osd.35.log.5.gz:2017-10-14
              11:43:35.570711 7f3508efd700  0 --

              >> 1.2.3.13:6801/1001792
              >> 1.2.3.20:6825/1806
              pipe(0x56413be34000 sd=138

              >> :6801 s=2 pgs=2763 cs=45 l=0
              c=0x564132999480).fault with nothing to send,

              >> going to standby

              >>> ceph-osd.35.log.5.gz:2017-10-14
              11:44:02.235548 7f3511be4700 -1

              >> log_channel(cluster) log [ERR] : 5.5e3s0
              deep-scrub 1 missing, 0

              >> inconsistent objects

              >>> ceph-osd.35.log.5.gz:2017-10-14
              11:44:02.235554 7f3511be4700 -1

              >> log_channel(cluster) log [ERR] : 5.5e3 deep-scrub
              1 errors

              >>> ceph-osd.35.log.5.gz:2017-10-14
              11:59:02.331454 7f34d6d4e700  0 --

              >> 1.2.3.13:6801/1001792
              >> 1.2.3.11:6817/1941
              pipe(0x56414d370800 sd=227

              >> :42104 s=2 pgs=3238 cs=89 l=0
              c=0x56413122d200).fault with nothing to send,

              >> going to standby

              >>

              >>

              >>

              >> On 10/18/2017 10:19 PM, Gregory Farnum wrote:

              >>> It would help if you can provide the exact
              output of "ceph -s", "pg

              >> query",

              >>> and any other relevant data. You shouldn't
              need to do manual repair of

              >>> erasure-coded pools, since it has checksums
              and can tell which bits are

              >>> bad. Following that article may not have done
              you any good (though I

              >>> wouldn't expect it to hurt, either...)...

              >>> -Greg

              >>>

              >>> On Wed, Oct 18, 2017 at 5:56 AM Stijn De
              Weirdt <stijn.deweirdt@xxxxxxxx

              >>>

              >>> wrote:

              >>>

              >>>> hi all,

              >>>>

              >>>> we have a ceph 10.2.7 cluster with a 8+3
              EC pool.

              >>>> in that pool, there is a pg in
              inconsistent state.

              >>>>

              >>>> we followed http://ceph.com/geen-categorie/ceph-manually-repair-object/

              >> ,

              >>>> however, we are unable to solve our
              issue.

              >>>>

              >>>> from the primary osd logs, the reported
              pg had a missing object.

              >>>>

              >>>> we found a related object on the primary
              osd, and then looked for

              >>>> similar ones on the other osds in same
              path (i guess it is just has the

              >>>> index of the osd in the pg list of osds
              suffixed)

              >>>>

              >>>> one osd did not have such a file (the 10
              others did).

              >>>>

              >>>> so we did the "stop osd/flush/start os/pg
              repair" on both the primary

              >>>> osd and on the osd with the missing EC
              part.

              >>>>

              >>>> however, the scrub error still exists.

              >>>>

              >>>> does anyone have any hints what to do in
              this case?

              >>>>

              >>>> stijn

              >>>>
              _______________________________________________

              >>>> ceph-users mailing list

              >>>> ceph-users@xxxxxxxxxxxxxx

              >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

              >>>>

              >>>

              >>

              >

            
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com