I'll power the cluster up today or tomorrow and take a look again, Dan, but the initial problem is that many of the pgs can't be queried — the requests time out. I don't know if it's purely the stale, or just the unknown pgs, that can't be queried, but I'll investigate if there's something wrong with mgr. I typically have plenty of running mgrs. Thanks for the advice on ignore_history. I'll avoid it for now. On Fri, Feb 5, 2021 at 6:52 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > Eeek! Don't run `osd_find_best_info_ignore_history_les = true` -- that > leads to data loss even such that you don't expect. > > > Are you sure all OSDs are up? > > Query a PG to find out why it is unknown: `ceph pg <id> query`. Feel > free to share that > > In fact, the 'unknown' state means the MGR doesn't know the state of > the PG -- is your MGR running correctly now? > > -- Dan > > > > > On Fri, Feb 5, 2021 at 4:49 PM Jeremy Austin <jhaustin@xxxxxxxxx> wrote: > > > > I was in the middle of a rebalance on a small test cluster with about 1% > of > > pgs degraded, and shut the cluster entirely down for maintenance. > > > > On startup, many pgs are entirely unknown, and most stale. In fact most > pgs > > can't be queried! No mon failures. Would osd logs tell me why pgs aren't > > even moving to an inactive state? > > > > I'm not concerned about data loss due to the shutdown (all activity to > the > > cluster had been stopped), so should I be setting some or all OSDs " > > osd_find_best_info_ignore_history_les = true"? > > > > Thank you, > > > > -- > > Jeremy Austin > > jhaustin@xxxxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Jeremy Austin jhaustin@xxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx