Re: How to recover from active+clean+inconsistent+failed_repair?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think this happens when a PG has 3 different copies and cannot decide which one is correct. You might have hit a very rare case. You should start with the scrub errors, check which PGs and which copies (OSDs) are affected. It sounds almost like all 3 scrub errors are on the same PG.

You might have had a combination of crash and OSD fail, your situation is probably not covered by "single point of failure".

In case you have a PG with scrub errors on 2 copies, you should be able to reconstruct the PG from the third with PG export/PG import commands.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Sagara Wijetunga <sagarawmw@xxxxxxxxx>
Sent: 01 November 2020 13:16:08
To: ceph-users@xxxxxxx
Subject:  How to recover from active+clean+inconsistent+failed_repair?

Hi all

I have a Ceph cluster (Nautilus 14.2.11) with 3 Ceph nodes.
A crash happened and all 3 Ceph nodes went down.
One (1) PG turned "active+clean+inconsistent", I tried to repair it. After the repair, now shows "active+clean+inconsistent+failed_repair" for the PG in the question and cannot bring the cluster to "active+clean".
How do I rescue the cluster? Is this a false positive?
Here are the detail:
All three Ceph nodes run ceph-mon, ceph-mgr, ceph-osd and ceph-mds.

1. ceph -s
health: HEALTH_ERR            3 scrub errors            Possible data damage: 1 pg inconsistent
pgs:     191 active+clean             1   active+clean+inconsistent

2. ceph health detailHEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistentOSD_SCRUB_ERRORS 3 scrub errorsPG_DAMAGED Possible data damage: 1 pg inconsistent    pg 3.b is active+clean+inconsistent, acting [0,1,2]

3. rados list-inconsistent-pg rbd[]

4. ceph pg deep-scrub 3.b

5. ceph pg repair 3.b

6. ceph health detailHEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistentOSD_SCRUB_ERRORS 3 scrub errorsPG_DAMAGED Possible data damage: 1 pg inconsistent    pg 3.b is active+clean+inconsistent+failed_repair, acting [0,1,2]

7. rados list-inconsistent-obj 3.b --format=json-pretty{    "epoch": 4769,    "inconsistents": []}

8. ceph pg 3.b list_unfound {    "num_missing": 0,    "num_unfound": 0,    "objects": [],    "more": false}
Appreciate your help.
ThanksSagara
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux