Hello ceph user-community, We have a reef (18.2.2) cluster with an erasure coded pool of k=8, m=4 that has a pg that is currently down with the following: ``` WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg incomplete pg 2.699 is remapped+incomplete, acting [NONE,NONE,NONE,NONE,NONE,353,488,333,408,282,167,145] (reducing pool i1-sea-rbd-01-data min_size from 9 may help; search ceph.com/docs for 'incomplete') ``` Timeline leading up to our current state noted above. We had an issue with a host that had a flappy network and during our efforts to restore the network some prolonged OSD heartbeats not reachable seems to have caused two OSDs to go down/out. The OSD containers were stuck in a restart loop and eventually were auto-out'd from the cluster. Restarting the affected OSD containers had no effect and the pg was stuck in a "down+remapped state." ``` [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg down pg 2.699 is down+remapped, acting [NONE,NONE,NONE,NONE,613,353,488,333,408,282,167,145] ``` Looking over the `pg query 2.699` we could the following: ``` "blocked": "peering is blocked due to down osds", "down_osds_we_would_probe": [ 38, 120 ], "peering_blocked_by": [ { "osd": 38, "current_lost_at": 90605, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 120, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" }, ``` With that output and reading ceph documentation[1], we decided to mark osd.38 as lost (`ceph osd lost 38`). Within a couple minutes, a 3rd OSD went down in osd.613 and exhibited the same restart loop behavior. Leaving us with the following state from a ` ceph pg query 2.699` ``` "blocked": "peering is blocked due to down osds", "down_osds_we_would_probe": [ 38, 120, 613 ], "peering_blocked_by": [ { "osd": 38, "current_lost_at": 90605, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 120, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 613, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" } ] }, { "name": "Started", "enter_time": "2024-12-21T17:13:18.435769+0000" } ], "agent_state": {} ``` After sleeping on it and letting the cluster recover from another failed OSD, we decided to mark all three as lost (osd.38, osd.120, and osd.613). Unfortunately, this caused two additional OSDs (osd.226 and osd.339) to go down. This changed the pg state from `down+remapped` to`remapped+incomplete`. We now have 5 down/out OSDs and are wondering if there is anyone to recover this PG? This one PG has caused our 1.6 PiB to be inaccessible.. We've gleaned some documentation[2] involving exporting the affected pg data with ceph-objectstore-tool and importing into a new OSD. Is this our only route to possibly recover this pg and its data? The strange thing is the OSDs underlying disk seems fine in that the ceph volume-group/LVM are intact. There are 0 issues with the disks from what I can glean or any indication of disk issues (the last 3 disks that were outed were part of a successful recovery/backfill with no issues). I tested the two original outed OSDs and the underlying disks all pass smartctl short tests and have seen no indication of hardware issues with these spinning disks. Looking over the ceph documentation, it looks like there is a newer tool that is cephadm/bluestore friendly in ceph-bluestore-tool[3] that does the same job as ceph-objectstore-tool? We are hoping there is a way to recover this PG and appreciate any advice. Currently our production cluster is down and are looking for recovery avenues. - Nick Links: [1] https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-pg/#placement-group-down-peering-failure [2] https://www.croit.io/blog/how-to-recover-inactive-pgs-using-ceph-objectstore-tool-on-ceph-clusters [3] https://docs.ceph.com/en/reef/man/8/ceph-bluestore-tool/ _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx