Re: Major ceph disaster

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 23 May 2019 11:08:54 +0200

I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer
their degraded PGs.

Open a window with `watch ceph -s`, then in another window slowly do

    ceph osd down 1
    # then wait a minute or so for that osd.1 to re-peer fully.
    ceph osd down 11
    ...

Continue that for each of the osds with stuck requests, or until there
are no more recovery_wait/degraded PGs.

After each `ceph osd down...`, you should expect to see several PGs
re-peer, and then ideally the slow requests will disappear and the
degraded PGs will become active+clean.
If anything else happens, you should stop and let us know.

-- dan

On Thu, May 23, 2019 at 10:59 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
>
> This is the current status of ceph:
>
>
>    cluster:
>      id:     23e72372-0d44-4cad-b24f-3641b14b86f4
>      health: HEALTH_ERR
>              9/125481144 objects unfound (0.000%)
>              Degraded data redundancy: 9/497011417 objects degraded
> (0.000%), 7 pgs degraded
>              9 stuck requests are blocked > 4096 sec. Implicated osds
> 1,11,21,32,43,50,65
>
>    services:
>      mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
>      mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
>      mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
> up:standby
>      osd: 96 osds: 96 up, 96 in
>
>    data:
>      pools:   2 pools, 4096 pgs
>      objects: 125.48M objects, 259TiB
>      usage:   370TiB used, 154TiB / 524TiB avail
>      pgs:     9/497011417 objects degraded (0.000%)
>               9/125481144 objects unfound (0.000%)
>               4078 active+clean
>               11   active+clean+scrubbing+deep
>               7    active+recovery_wait+degraded
>
>    io:
>      client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr
>
> On 23.05.19 10:54 vorm., Dan van der Ster wrote:
> > What's the full ceph status?
> > Normally recovery_wait just means that the relevant osd's are busy
> > recovering/backfilling another PG.
> >
> > On Thu, May 23, 2019 at 10:53 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
> >> Hi,
> >>
> >> we have set the PGs to recover and now they are stuck in active+recovery_wait+degraded and instructing them to deep-scrub does not change anything. Hence, the rados report is empty. Is there a way to stop the recovery wait to start the deep-scrub and get the output? I guess the recovery_wait might be caused by missing objects. Do we need to delete them first to get the recovery going?
> >>
> >> Kevin
> >>
> >> On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
> >>
> >> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
> >>> Hi,
> >>>
> >>> thank you, it worked. The PGs are not incomplete anymore. Still we have
> >>> another problem, there are 7 PGs inconsistent and a cpeh pg repair is
> >>> not doing anything. I just get "instructing pg 1.5dd on osd.24 to
> >>> repair" and nothing happens. Does somebody know how we can get the PGs
> >>> to repair?
> >>>
> >>> Regards,
> >>>
> >>> Kevin
> >>
> >> Kevin,
> >>
> >> I just fixed an inconsistent PG yesterday. You will need to figure out why they are inconsistent. Do these steps and then we can figure out how to proceed.
> >> 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them)
> >> 2. Print out the inconsistent report for each inconsistent PG. `rados list-inconsistent-obj <PG_NUM> --format=json-pretty`
> >> 3. You will want to look at the error messages and see if all the shards have the same data.
> >>
> >> Robert LeBlanc
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com