With ceph-disk this is why I change the type codes on the partitions to the 2B slug. I have had to resurrect more than once. Not sure if this works with ceph-volume. Thoughts? >> The devices were wiped after removing them, so the data is definitely lost. > > That sucks. I cannot emphasize enough that you should *never* wipe a > device until a cluster is completely active+clean. > >> I recreated the missing PG, now my MDS is complaining about damaged >> metadata (since the PG was part of the cephfs_metadata pool). > > You probably need to go through the cephfs repair procedure to > rebuild/repair the hierarchy, as you've lost a random subset of the > directories in the file system. :( > > Best of luck! > > sage > > >> >> >> >>> On Thu, Jul 4, 2019 at 11:39 AM Sage Weil <sage@xxxxxxxxxxxx> wrote: >>> >>> The 'stale' menas that there is *no* active copy of the PG in the cluster. >>> If all of the good OSDs are up and the PG is still stale, that probably >>> means there are no more copies. >>> >>> What happened to the OSDs you removed? If the devices aren't completely >>> destroyed (e.g., HDD won't spin up) then most likely you can use >>> ceph-objectstore-tool to extract a surviving copy of the PG from one of >>> them. >>> >>> sage >>> >>> >>>> On Thu, 4 Jul 2019, Wyllys Ingersoll wrote: >>>> >>>> I recently upgraded from Luminous to Mimic (13.2.6) and now I find >>>> that I have a single pg that is stale and cannot be repaired or >>>> cleaned up. I had to remove several OSDs due to some issues after the >>>> upgrade and now I have this one pg that is in a state that I cannot >>>> get rid of. >>>> >>>> Any hints on how to fix this? >>>> >>>> $ ceph pg dump_stuck >>>> PG_STAT STATE UP UP_PRIMARY ACTING >>>> ACTING_PRIMARY >>>> 54.163 stale+active+undersized+degraded [94,63] 94 [94,63] >>>> 94 >>>> >>>> $ ceph pg 54.163 query >>>> Error ENOENT: i don't have pgid 54.163 >>>> >>>> Both osd.94 and osd.63 are OK, pool 54 (cephfs_metadata) is a 3-copy >>>> pool, so it looks like the 3rd copy of the pg is missing. The logs >>>> for osd.94 and osd.63 show this when I grep for the missing pg: >>>> >>>> 2019-07-02 14:14:51.632 7fb796f54700 1 osd.63 pg_epoch: 173782 >>>> pg[54.163( v 173599'29032 (165339'27516,173599'29032] >>>> local-lis/les=173715/173716 n=1213 ec=56712/56712 lis/c 173715/173657 >>>> les/c/f 173716/173658/0 173782/173782/173770) [94,63] r=1 lpr=173782 >>>> pi=[173657,173782)/1 crt=173599'29032 lcod 0'0 unknown NOTIFY mbc={}] >>>> state<Start>: transitioning to Stray >>>> >>>> thanks, >>>> Wyllys Ingersoll >>>> _______________________________________________ >>>> Dev mailing list -- dev@xxxxxxx >>>> To unsubscribe send an email to dev-leave@xxxxxxx >>>> >>>> >> >> > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx