I think this happens when a PG has 3 different copies and cannot decide which one is correct. You might have hit a very rare case. You should start with the scrub errors, check which PGs and which copies (OSDs) are affected. It sounds almost like all 3 scrub errors are on the same PG. You might have had a combination of crash and OSD fail, your situation is probably not covered by "single point of failure". In case you have a PG with scrub errors on 2 copies, you should be able to reconstruct the PG from the third with PG export/PG import commands. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Sagara Wijetunga <sagarawmw@xxxxxxxxx> Sent: 01 November 2020 13:16:08 To: ceph-users@xxxxxxx Subject: How to recover from active+clean+inconsistent+failed_repair? Hi all I have a Ceph cluster (Nautilus 14.2.11) with 3 Ceph nodes. A crash happened and all 3 Ceph nodes went down. One (1) PG turned "active+clean+inconsistent", I tried to repair it. After the repair, now shows "active+clean+inconsistent+failed_repair" for the PG in the question and cannot bring the cluster to "active+clean". How do I rescue the cluster? Is this a false positive? Here are the detail: All three Ceph nodes run ceph-mon, ceph-mgr, ceph-osd and ceph-mds. 1. ceph -s health: HEALTH_ERR 3 scrub errors Possible data damage: 1 pg inconsistent pgs: 191 active+clean 1 active+clean+inconsistent 2. ceph health detailHEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistentOSD_SCRUB_ERRORS 3 scrub errorsPG_DAMAGED Possible data damage: 1 pg inconsistent pg 3.b is active+clean+inconsistent, acting [0,1,2] 3. rados list-inconsistent-pg rbd[] 4. ceph pg deep-scrub 3.b 5. ceph pg repair 3.b 6. ceph health detailHEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistentOSD_SCRUB_ERRORS 3 scrub errorsPG_DAMAGED Possible data damage: 1 pg inconsistent pg 3.b is active+clean+inconsistent+failed_repair, acting [0,1,2] 7. rados list-inconsistent-obj 3.b --format=json-pretty{ "epoch": 4769, "inconsistents": []} 8. ceph pg 3.b list_unfound { "num_missing": 0, "num_unfound": 0, "objects": [], "more": false} Appreciate your help. ThanksSagara _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx