Re: Major ceph disaster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We got the object ids of the missing objects with ceph pg 1.24c list_missing:

{
    "offset": {
        "oid": "",
        "key": "",
        "snapid": 0,
        "hash": 0,
        "max": 0,
        "pool": -9223372036854775808,
        "namespace": ""
    },
    "num_missing": 1,
    "num_unfound": 1,
    "objects": [
        {
            "oid": {
                "oid": "10004dfce92.0000003d",
                "key": "",
                "snapid": -2,
                "hash": 90219084,
                "max": 0,
                "pool": 1,
                "namespace": ""
            },
            "need": "46950'195355",
            "have": "0'0",
            "flags": "none",
            "locations": [
                "36(3)",
                "61(2)"
            ]
        }
    ],
    "more": false
}

we want to give up those objects with:

ceph pg 1.24c mark_unfound_lost revert

But first we would like to know which file(s) is affected. Is there a way to map the object id to the corresponding file?

On 23.05.19 3:52 nachm., Alexandre Marangone wrote:
The PGs will stay active+recovery_wait+degraded until you solve the unfound objects issue.
You can follow this doc to look at which objects are unfound[1]  and if no other recourse mark them lost


On Thu, May 23, 2019 at 5:47 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
thank you for this idea, it has improved the situation. Nevertheless,
there are still 2 PGs in recovery_wait. ceph -s gives me:

   cluster:
     id:     23e72372-0d44-4cad-b24f-3641b14b86f4
     health: HEALTH_WARN
             3/125481112 objects unfound (0.000%)
             Degraded data redundancy: 3/497011315 objects degraded
(0.000%), 2 pgs degraded

   services:
     mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
     mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
     mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
up:standby
     osd: 96 osds: 96 up, 96 in

   data:
     pools:   2 pools, 4096 pgs
     objects: 125.48M objects, 259TiB
     usage:   370TiB used, 154TiB / 524TiB avail
     pgs:     3/497011315 objects degraded (0.000%)
              3/125481112 objects unfound (0.000%)
              4083 active+clean
              10   active+clean+scrubbing+deep
              2    active+recovery_wait+degraded
              1    active+clean+scrubbing

   io:
     client:   318KiB/s rd, 77.0KiB/s wr, 190op/s rd, 0op/s wr


and ceph health detail:

HEALTH_WARN 3/125481112 objects unfound (0.000%); Degraded data
redundancy: 3/497011315 objects degraded (0.000%), 2 p
gs degraded
OBJECT_UNFOUND 3/125481112 objects unfound (0.000%)
     pg 1.24c has 1 unfound objects
     pg 1.779 has 2 unfound objects
PG_DEGRADED Degraded data redundancy: 3/497011315 objects degraded
(0.000%), 2 pgs degraded
     pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1
unfound
     pg 1.779 is active+recovery_wait+degraded, acting [50,4,77,62], 2
unfound


also the status changed form HEALTH_ERR to HEALTH_WARN. We also did ceph
osd down for all OSDs of the degraded PGs. Do you have any further
suggestions on how to proceed?

On 23.05.19 11:08 vorm., Dan van der Ster wrote:
> I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer
> their degraded PGs.
>
> Open a window with `watch ceph -s`, then in another window slowly do
>
>      ceph osd down 1
>      # then wait a minute or so for that osd.1 to re-peer fully.
>      ceph osd down 11
>      ...
>
> Continue that for each of the osds with stuck requests, or until there
> are no more recovery_wait/degraded PGs.
>
> After each `ceph osd down...`, you should expect to see several PGs
> re-peer, and then ideally the slow requests will disappear and the
> degraded PGs will become active+clean.
> If anything else happens, you should stop and let us know.
>
>
> -- dan
>
> On Thu, May 23, 2019 at 10:59 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
>> This is the current status of ceph:
>>
>>
>>     cluster:
>>       id:     23e72372-0d44-4cad-b24f-3641b14b86f4
>>       health: HEALTH_ERR
>>               9/125481144 objects unfound (0.000%)
>>               Degraded data redundancy: 9/497011417 objects degraded
>> (0.000%), 7 pgs degraded
>>               9 stuck requests are blocked > 4096 sec. Implicated osds
>> 1,11,21,32,43,50,65
>>
>>     services:
>>       mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
>>       mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
>>       mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
>> up:standby
>>       osd: 96 osds: 96 up, 96 in
>>
>>     data:
>>       pools:   2 pools, 4096 pgs
>>       objects: 125.48M objects, 259TiB
>>       usage:   370TiB used, 154TiB / 524TiB avail
>>       pgs:     9/497011417 objects degraded (0.000%)
>>                9/125481144 objects unfound (0.000%)
>>                4078 active+clean
>>                11   active+clean+scrubbing+deep
>>                7    active+recovery_wait+degraded
>>
>>     io:
>>       client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr
>>
>> On 23.05.19 10:54 vorm., Dan van der Ster wrote:
>>> What's the full ceph status?
>>> Normally recovery_wait just means that the relevant osd's are busy
>>> recovering/backfilling another PG.
>>>
>>> On Thu, May 23, 2019 at 10:53 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> we have set the PGs to recover and now they are stuck in active+recovery_wait+degraded and instructing them to deep-scrub does not change anything. Hence, the rados report is empty. Is there a way to stop the recovery wait to start the deep-scrub and get the output? I guess the recovery_wait might be caused by missing objects. Do we need to delete them first to get the recovery going?
>>>>
>>>> Kevin
>>>>
>>>> On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
>>>>
>>>> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> thank you, it worked. The PGs are not incomplete anymore. Still we have
>>>>> another problem, there are 7 PGs inconsistent and a cpeh pg repair is
>>>>> not doing anything. I just get "instructing pg 1.5dd on osd.24 to
>>>>> repair" and nothing happens. Does somebody know how we can get the PGs
>>>>> to repair?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kevin
>>>> Kevin,
>>>>
>>>> I just fixed an inconsistent PG yesterday. You will need to figure out why they are inconsistent. Do these steps and then we can figure out how to proceed.
>>>> 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them)
>>>> 2. Print out the inconsistent report for each inconsistent PG. `rados list-inconsistent-obj <PG_NUM> --format=json-pretty`
>>>> 3. You will want to look at the error messages and see if all the shards have the same data.
>>>>
>>>> Robert LeBlanc
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux