The 'stale' menas that there is *no* active copy of the PG in the cluster. If all of the good OSDs are up and the PG is still stale, that probably means there are no more copies. What happened to the OSDs you removed? If the devices aren't completely destroyed (e.g., HDD won't spin up) then most likely you can use ceph-objectstore-tool to extract a surviving copy of the PG from one of them. sage On Thu, 4 Jul 2019, Wyllys Ingersoll wrote: > I recently upgraded from Luminous to Mimic (13.2.6) and now I find > that I have a single pg that is stale and cannot be repaired or > cleaned up. I had to remove several OSDs due to some issues after the > upgrade and now I have this one pg that is in a state that I cannot > get rid of. > > Any hints on how to fix this? > > $ ceph pg dump_stuck > PG_STAT STATE UP UP_PRIMARY ACTING > ACTING_PRIMARY > 54.163 stale+active+undersized+degraded [94,63] 94 [94,63] > 94 > > $ ceph pg 54.163 query > Error ENOENT: i don't have pgid 54.163 > > Both osd.94 and osd.63 are OK, pool 54 (cephfs_metadata) is a 3-copy > pool, so it looks like the 3rd copy of the pg is missing. The logs > for osd.94 and osd.63 show this when I grep for the missing pg: > > 2019-07-02 14:14:51.632 7fb796f54700 1 osd.63 pg_epoch: 173782 > pg[54.163( v 173599'29032 (165339'27516,173599'29032] > local-lis/les=173715/173716 n=1213 ec=56712/56712 lis/c 173715/173657 > les/c/f 173716/173658/0 173782/173782/173770) [94,63] r=1 lpr=173782 > pi=[173657,173782)/1 crt=173599'29032 lcod 0'0 unknown NOTIFY mbc={}] > state<Start>: transitioning to Stray > > thanks, > Wyllys Ingersoll > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx > > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx