Re: is unknown pg going to be active after osds are fixed?

Jeremy Austin <jhaustin@xxxxxxxxx> · Tue, 2 Feb 2021 07:57:56 -0900

I'm in a similar but not identical situation.

I was in the middle of a rebalance on a small test cluster, without about
1% of pgs degraded, and shut the cluster entirely down for maintenance. On
startup, many pgs are entirely unknown, and most stale. In fact most pgs
can't be queried! No mon failures. No obvious signs of OSD failure (and the
problem is too widespread for that.) Is there a specific way to force OSDs
to rescan and re-advertise their pgs? Is there a specific startup order
that fixes this, i.e., start all OSDs first and then start mons?

I'm baffled,
Jeremy

On Mon, Feb 1, 2021 at 10:43 PM Wido den Hollander <wido@xxxxxxxx> wrote:

>
>
> On 01/02/2021 22:48, Tony Liu wrote:
> > Hi,
> >
> > With 3 replicas, a pg hs 3 osds. If all those 3 osds are down,
> > the pg becomes unknow. Is that right?
> >
>
> Yes. As no OSD can report the status to the MONs.
>
> > If those 3 osds are replaced and in and on, is that pg going to
> > be eventually back to active? Or anything else has to be done
> > to fix it?
> >
>
> If you can bring back the OSDs without wiping them: Yes
>
> As you mention the word 'replaced' I was wondering what you mean by
> that. If you replace the disks without data recovery the PGs will be lost.
>
> So you need to bring back the OSDs with their data in tact for the PG to
> come back online.
>
> Wido
>
> >
> > Thanks!
> > Tony
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Jeremy Austin
jhaustin@xxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx