Re: clean up stale pg?

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 4 Jul 2019 15:38:59 +0000 (UTC)

The 'stale' menas that there is *no* active copy of the PG in the cluster.  
If all of the good OSDs are up and the PG is still stale, that probably 
means there are no more copies.

What happened to the OSDs you removed?  If the devices aren't completely 
destroyed (e.g., HDD won't spin up) then most likely you can use 
ceph-objectstore-tool to extract a surviving copy of the PG from one of 
them.

sage

On Thu, 4 Jul 2019, Wyllys Ingersoll wrote:

> I recently upgraded from Luminous to Mimic (13.2.6)  and now I find
> that I have a single pg that is stale and cannot be repaired or
> cleaned up.  I had to remove several OSDs due to some issues after the
> upgrade and now I have this one pg that is in a state that I cannot
> get rid of.
> 
> Any hints on how to fix this?
> 
> $ ceph pg dump_stuck
> PG_STAT STATE                            UP      UP_PRIMARY ACTING
> ACTING_PRIMARY
> 54.163  stale+active+undersized+degraded [94,63]         94 [94,63]
>          94
> 
> $ ceph pg 54.163 query
> Error ENOENT: i don't have pgid 54.163
> 
> Both osd.94 and osd.63 are OK, pool 54 (cephfs_metadata) is a 3-copy
> pool, so it looks like the 3rd copy of the pg is missing.  The logs
> for osd.94 and osd.63 show this when I grep for the missing pg:
> 
> 2019-07-02 14:14:51.632 7fb796f54700  1 osd.63 pg_epoch: 173782
> pg[54.163( v 173599'29032 (165339'27516,173599'29032]
> local-lis/les=173715/173716 n=1213 ec=56712/56712 lis/c 173715/173657
> les/c/f 173716/173658/0 173782/173782/173770) [94,63] r=1 lpr=173782
> pi=[173657,173782)/1 crt=173599'29032 lcod 0'0 unknown NOTIFY mbc={}]
> state<Start>: transitioning to Stray
> 
> thanks,
>   Wyllys Ingersoll
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
> 
> 
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx