Re: Major ceph disaster

Kevin Flöh <kevin.floeh@xxxxxxx> · Fri, 24 May 2019 09:48:15 +0200

    We got the object ids of the missing objects with ceph pg 1.24c list_missing:
    {

              "offset": {

                  "oid": "",

                  "key": "",

                  "snapid": 0,

                  "hash": 0,

                  "max": 0,

                  "pool": -9223372036854775808,

                  "namespace": ""

              },

              "num_missing": 1,

              "num_unfound": 1,

              "objects": [

                  {

                      "oid": {

                          "oid": "10004dfce92.0000003d",

                          "key": "",

                          "snapid": -2,

                          "hash": 90219084,

                          "max": 0,

                          "pool": 1,

                          "namespace": ""

                      },

                      "need": "46950'195355",

                      "have": "0'0",

                      "flags": "none",

                      "locations": [

                          "36(3)",

                          "61(2)"

                      ]

                  }

              ],

              "more": false

          }

    we want to give up those
          objects with:
    ceph pg 1.24c mark_unfound_lost revert

But first we would like to know which file(s) is affected. Is there a way to map the object id to the corresponding file?

    On 23.05.19 3:52 nachm., Alexandre
      Marangone wrote:

          The PGs will stay active+recovery_wait+degraded
            until you solve the unfound objects issue.
            You can follow this doc to look at which objects are
              unfound[1]  and if no other recourse mark them lost

            [1] http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#unfound-objects. 

        On Thu, May 23, 2019 at 5:47
          AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:

        thank
          you for this idea, it has improved the situation.
          Nevertheless, 

          there are still 2 PGs in recovery_wait. ceph -s gives me:

             cluster:

               id:     23e72372-0d44-4cad-b24f-3641b14b86f4

               health: HEALTH_WARN

                       3/125481112 objects unfound (0.000%)

                       Degraded data redundancy: 3/497011315 objects
          degraded 

          (0.000%), 2 pgs degraded

             services:

               mon: 3 daemons, quorum
          ceph-node03,ceph-node01,ceph-node02

               mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu

               mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active},
          3 

          up:standby

               osd: 96 osds: 96 up, 96 in

             data:

               pools:   2 pools, 4096 pgs

               objects: 125.48M objects, 259TiB

               usage:   370TiB used, 154TiB / 524TiB avail

               pgs:     3/497011315 objects degraded (0.000%)

                        3/125481112 objects unfound (0.000%)

                        4083 active+clean

                        10   active+clean+scrubbing+deep

                        2    active+recovery_wait+degraded

                        1    active+clean+scrubbing

             io:

               client:   318KiB/s rd, 77.0KiB/s wr, 190op/s rd, 0op/s wr

          and ceph health detail:

          HEALTH_WARN 3/125481112 objects unfound (0.000%); Degraded
          data 

          redundancy: 3/497011315 objects degraded (0.000%), 2 p

          gs degraded

          OBJECT_UNFOUND 3/125481112 objects unfound (0.000%)

               pg 1.24c has 1 unfound objects

               pg 1.779 has 2 unfound objects

          PG_DEGRADED Degraded data redundancy: 3/497011315 objects
          degraded 

          (0.000%), 2 pgs degraded

               pg 1.24c is active+recovery_wait+degraded, acting
          [32,4,61,36], 1 

          unfound

               pg 1.779 is active+recovery_wait+degraded, acting
          [50,4,77,62], 2 

          unfound

          also the status changed form HEALTH_ERR to HEALTH_WARN. We
          also did ceph 

          osd down for all OSDs of the degraded PGs. Do you have any
          further 

          suggestions on how to proceed?

          On 23.05.19 11:08 vorm., Dan van der Ster wrote:

          > I think those osds (1, 11, 21, 32, ...) need a little
          kick to re-peer

          > their degraded PGs.

          >

          > Open a window with `watch ceph -s`, then in another
          window slowly do

          >

          >      ceph osd down 1

          >      # then wait a minute or so for that osd.1 to re-peer
          fully.

          >      ceph osd down 11

          >      ...

          >

          > Continue that for each of the osds with stuck requests,
          or until there

          > are no more recovery_wait/degraded PGs.

          >

          > After each `ceph osd down...`, you should expect to see
          several PGs

          > re-peer, and then ideally the slow requests will
          disappear and the

          > degraded PGs will become active+clean.

          > If anything else happens, you should stop and let us
          know.

          >

          >

          > -- dan

          >

          > On Thu, May 23, 2019 at 10:59 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:

          >> This is the current status of ceph:

          >>

          >>

          >>     cluster:

          >>       id:     23e72372-0d44-4cad-b24f-3641b14b86f4

          >>       health: HEALTH_ERR

          >>               9/125481144 objects unfound (0.000%)

          >>               Degraded data redundancy: 9/497011417
          objects degraded

          >> (0.000%), 7 pgs degraded

          >>               9 stuck requests are blocked > 4096
          sec. Implicated osds

          >> 1,11,21,32,43,50,65

          >>

          >>     services:

          >>       mon: 3 daemons, quorum
          ceph-node03,ceph-node01,ceph-node02

          >>       mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu

          >>       mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active},
          3

          >> up:standby

          >>       osd: 96 osds: 96 up, 96 in

          >>

          >>     data:

          >>       pools:   2 pools, 4096 pgs

          >>       objects: 125.48M objects, 259TiB

          >>       usage:   370TiB used, 154TiB / 524TiB avail

          >>       pgs:     9/497011417 objects degraded (0.000%)

          >>                9/125481144 objects unfound (0.000%)

          >>                4078 active+clean

          >>                11   active+clean+scrubbing+deep

          >>                7    active+recovery_wait+degraded

          >>

          >>     io:

          >>       client:   211KiB/s rd, 46.0KiB/s wr, 158op/s
          rd, 0op/s wr

          >>

          >> On 23.05.19 10:54 vorm., Dan van der Ster wrote:

          >>> What's the full ceph status?

          >>> Normally recovery_wait just means that the
          relevant osd's are busy

          >>> recovering/backfilling another PG.

          >>>

          >>> On Thu, May 23, 2019 at 10:53 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:

          >>>> Hi,

          >>>>

          >>>> we have set the PGs to recover and now they
          are stuck in active+recovery_wait+degraded and instructing
          them to deep-scrub does not change anything. Hence, the rados
          report is empty. Is there a way to stop the recovery wait to
          start the deep-scrub and get the output? I guess the
          recovery_wait might be caused by missing objects. Do we need
          to delete them first to get the recovery going?

          >>>>

          >>>> Kevin

          >>>>

          >>>> On 22.05.19 6:03 nachm., Robert LeBlanc
          wrote:

          >>>>

          >>>> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh
          <kevin.floeh@xxxxxxx> wrote:

          >>>>> Hi,

          >>>>>

          >>>>> thank you, it worked. The PGs are not
          incomplete anymore. Still we have

          >>>>> another problem, there are 7 PGs
          inconsistent and a cpeh pg repair is

          >>>>> not doing anything. I just get
          "instructing pg 1.5dd on osd.24 to

          >>>>> repair" and nothing happens. Does
          somebody know how we can get the PGs

          >>>>> to repair?

          >>>>>

          >>>>> Regards,

          >>>>>

          >>>>> Kevin

          >>>> Kevin,

          >>>>

          >>>> I just fixed an inconsistent PG yesterday.
          You will need to figure out why they are inconsistent. Do
          these steps and then we can figure out how to proceed.

          >>>> 1. Do a deep-scrub on each PG that is
          inconsistent. (This may fix some of them)

          >>>> 2. Print out the inconsistent report for each
          inconsistent PG. `rados list-inconsistent-obj <PG_NUM>
          --format=json-pretty`

          >>>> 3. You will want to look at the error
          messages and see if all the shards have the same data.

          >>>>

          >>>> Robert LeBlanc

          >>>>

          >>>>

          >>>>
          _______________________________________________

          >>>> ceph-users mailing list

          >>>> ceph-users@xxxxxxxxxxxxxx

          >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

          _______________________________________________

          ceph-users mailing list

          ceph-users@xxxxxxxxxxxxxx

          http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com