Re: clean up stale pg?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 4 Jul 2019, Wyllys Ingersoll wrote:
> The devices were wiped after removing them, so the data is definitely lost.

That sucks.  I cannot emphasize enough that you should *never* wipe a 
device until a cluster is completely active+clean.
 
> I recreated the missing PG, now my MDS is complaining about damaged
> metadata (since the PG was part of the cephfs_metadata pool).

You probably need to go through the cephfs repair procedure to 
rebuild/repair the hierarchy, as you've lost a random subset of the 
directories in the file system.  :(

Best of luck!

sage


> 
> 
> 
> On Thu, Jul 4, 2019 at 11:39 AM Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >
> > The 'stale' menas that there is *no* active copy of the PG in the cluster.
> > If all of the good OSDs are up and the PG is still stale, that probably
> > means there are no more copies.
> >
> > What happened to the OSDs you removed?  If the devices aren't completely
> > destroyed (e.g., HDD won't spin up) then most likely you can use
> > ceph-objectstore-tool to extract a surviving copy of the PG from one of
> > them.
> >
> > sage
> >
> >
> > On Thu, 4 Jul 2019, Wyllys Ingersoll wrote:
> >
> > > I recently upgraded from Luminous to Mimic (13.2.6)  and now I find
> > > that I have a single pg that is stale and cannot be repaired or
> > > cleaned up.  I had to remove several OSDs due to some issues after the
> > > upgrade and now I have this one pg that is in a state that I cannot
> > > get rid of.
> > >
> > > Any hints on how to fix this?
> > >
> > > $ ceph pg dump_stuck
> > > PG_STAT STATE                            UP      UP_PRIMARY ACTING
> > > ACTING_PRIMARY
> > > 54.163  stale+active+undersized+degraded [94,63]         94 [94,63]
> > >          94
> > >
> > > $ ceph pg 54.163 query
> > > Error ENOENT: i don't have pgid 54.163
> > >
> > > Both osd.94 and osd.63 are OK, pool 54 (cephfs_metadata) is a 3-copy
> > > pool, so it looks like the 3rd copy of the pg is missing.  The logs
> > > for osd.94 and osd.63 show this when I grep for the missing pg:
> > >
> > > 2019-07-02 14:14:51.632 7fb796f54700  1 osd.63 pg_epoch: 173782
> > > pg[54.163( v 173599'29032 (165339'27516,173599'29032]
> > > local-lis/les=173715/173716 n=1213 ec=56712/56712 lis/c 173715/173657
> > > les/c/f 173716/173658/0 173782/173782/173770) [94,63] r=1 lpr=173782
> > > pi=[173657,173782)/1 crt=173599'29032 lcod 0'0 unknown NOTIFY mbc={}]
> > > state<Start>: transitioning to Stray
> > >
> > > thanks,
> > >   Wyllys Ingersoll
> > > _______________________________________________
> > > Dev mailing list -- dev@xxxxxxx
> > > To unsubscribe send an email to dev-leave@xxxxxxx
> > >
> > >
> 
> 
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux