Re: Major ceph disaster

Alexandre Marangone <a.marangone@xxxxxxxxx> · Thu, 23 May 2019 06:52:08 -0700

The PGs will stay active+recovery_wait+degraded until you solve the unfound objects issue.You can follow this doc to look at which objects are unfound[1]  and if no other recourse mark them lost

[1] http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#unfound-objects. 

On Thu, May 23, 2019 at 5:47 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
thank you for this idea, it has improved the situation. Nevertheless, 

there are still 2 PGs in recovery_wait. ceph -s gives me:

   cluster:

     id:     23e72372-0d44-4cad-b24f-3641b14b86f4

     health: HEALTH_WARN

             3/125481112 objects unfound (0.000%)

             Degraded data redundancy: 3/497011315 objects degraded 

(0.000%), 2 pgs degraded

   services:

     mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02

     mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu

     mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3 

up:standby

     osd: 96 osds: 96 up, 96 in

   data:

     pools:   2 pools, 4096 pgs

     objects: 125.48M objects, 259TiB

     usage:   370TiB used, 154TiB / 524TiB avail

     pgs:     3/497011315 objects degraded (0.000%)

              3/125481112 objects unfound (0.000%)

              4083 active+clean

              10   active+clean+scrubbing+deep

              2    active+recovery_wait+degraded

              1    active+clean+scrubbing

   io:

     client:   318KiB/s rd, 77.0KiB/s wr, 190op/s rd, 0op/s wr

and ceph health detail:

HEALTH_WARN 3/125481112 objects unfound (0.000%); Degraded data 

redundancy: 3/497011315 objects degraded (0.000%), 2 p

gs degraded

OBJECT_UNFOUND 3/125481112 objects unfound (0.000%)

     pg 1.24c has 1 unfound objects

     pg 1.779 has 2 unfound objects

PG_DEGRADED Degraded data redundancy: 3/497011315 objects degraded 

(0.000%), 2 pgs degraded

     pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1 

unfound

     pg 1.779 is active+recovery_wait+degraded, acting [50,4,77,62], 2 

unfound

also the status changed form HEALTH_ERR to HEALTH_WARN. We also did ceph 

osd down for all OSDs of the degraded PGs. Do you have any further 

suggestions on how to proceed?

On 23.05.19 11:08 vorm., Dan van der Ster wrote:

> I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer

> their degraded PGs.

>

> Open a window with `watch ceph -s`, then in another window slowly do

>

>      ceph osd down 1

>      # then wait a minute or so for that osd.1 to re-peer fully.

>      ceph osd down 11

>      ...

>

> Continue that for each of the osds with stuck requests, or until there

> are no more recovery_wait/degraded PGs.

>

> After each `ceph osd down...`, you should expect to see several PGs

> re-peer, and then ideally the slow requests will disappear and the

> degraded PGs will become active+clean.

> If anything else happens, you should stop and let us know.

>

>

> -- dan

>

> On Thu, May 23, 2019 at 10:59 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:

>> This is the current status of ceph:

>>

>>

>>     cluster:

>>       id:     23e72372-0d44-4cad-b24f-3641b14b86f4

>>       health: HEALTH_ERR

>>               9/125481144 objects unfound (0.000%)

>>               Degraded data redundancy: 9/497011417 objects degraded

>> (0.000%), 7 pgs degraded

>>               9 stuck requests are blocked > 4096 sec. Implicated osds

>> 1,11,21,32,43,50,65

>>

>>     services:

>>       mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02

>>       mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu

>>       mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3

>> up:standby

>>       osd: 96 osds: 96 up, 96 in

>>

>>     data:

>>       pools:   2 pools, 4096 pgs

>>       objects: 125.48M objects, 259TiB

>>       usage:   370TiB used, 154TiB / 524TiB avail

>>       pgs:     9/497011417 objects degraded (0.000%)

>>                9/125481144 objects unfound (0.000%)

>>                4078 active+clean

>>                11   active+clean+scrubbing+deep

>>                7    active+recovery_wait+degraded

>>

>>     io:

>>       client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr

>>

>> On 23.05.19 10:54 vorm., Dan van der Ster wrote:

>>> What's the full ceph status?

>>> Normally recovery_wait just means that the relevant osd's are busy

>>> recovering/backfilling another PG.

>>>

>>> On Thu, May 23, 2019 at 10:53 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:

>>>> Hi,

>>>>

>>>> we have set the PGs to recover and now they are stuck in active+recovery_wait+degraded and instructing them to deep-scrub does not change anything. Hence, the rados report is empty. Is there a way to stop the recovery wait to start the deep-scrub and get the output? I guess the recovery_wait might be caused by missing objects. Do we need to delete them first to get the recovery going?

>>>>

>>>> Kevin

>>>>

>>>> On 22.05.19 6:03 nachm., Robert LeBlanc wrote:

>>>>

>>>> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:

>>>>> Hi,

>>>>>

>>>>> thank you, it worked. The PGs are not incomplete anymore. Still we have

>>>>> another problem, there are 7 PGs inconsistent and a cpeh pg repair is

>>>>> not doing anything. I just get "instructing pg 1.5dd on osd.24 to

>>>>> repair" and nothing happens. Does somebody know how we can get the PGs

>>>>> to repair?

>>>>>

>>>>> Regards,

>>>>>

>>>>> Kevin

>>>> Kevin,

>>>>

>>>> I just fixed an inconsistent PG yesterday. You will need to figure out why they are inconsistent. Do these steps and then we can figure out how to proceed.

>>>> 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them)

>>>> 2. Print out the inconsistent report for each inconsistent PG. `rados list-inconsistent-obj <PG_NUM> --format=json-pretty`

>>>> 3. You will want to look at the error messages and see if all the shards have the same data.

>>>>

>>>> Robert LeBlanc

>>>>

>>>>

>>>> _______________________________________________

>>>> ceph-users mailing list

>>>> ceph-users@xxxxxxxxxxxxxx

>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com