Re: 2 OSDs Near Full, Others Under 50%

Dave Hall <kdhall@xxxxxxxxxxxxxx> · Fri, 29 Oct 2021 09:57:14 -0400

On Fri, Oct 29, 2021 at 2:23 AM Janne Johansson <icepic.dz@xxxxxxxxx> wrote:

> Den tors 28 okt. 2021 kl 22:25 skrev Dave Hall <kdhall@xxxxxxxxxxxxxx>:
> > Hello,
> > I have a Nautilus 14.2.21 cluster with 48 x 12TB OSDs across 6 nodes,
> with
> > 3 new nodes and 24 more OSDs ready to come online.  The bulk of my pools
> > are EC 8+2 with a failure domain of OSD.
> > Until yesterday one of the original 48 OSDs had failed and been destroyed
> > for a few months.  I was finally able to replace this OSD yesterday.
> >
> > As the backfilling started I noticed that the other 47 OSDs had gotten
> > fairly out of balance somehow.  They range from 53 PGs and 46% full to 87
> > PGs and 85% full.  I thought the Balancer would be taking care of this,
> but
> > perhaps there is a problem with my settings.
> >
> > As of this morning I had 1 nearfull OSD.  As of now I have two.  Due to
> the
> > rebuild OSD I still have 70 PGs to get remapped.  On the other hand, the
> > rebuild OSD has been assigned 73 PGs, but so far it's only up to 7% full.
> >
> > From what I've been able to find, it looks like the Balancer won't run
> > until after the backfilling is complete.   When I get there I'd like to
> > have the right Balancer setting in place to improve the balance before I
> > start introducing the new OSDs.
> >
> > Any advice or insight would be greatly appreciated.  In particular, I
> > noticed that my Balancer mode was 'upmap'.  SInce all of my OSDs are the
> > same and my crush-map is flat and uniform, recommendations against
> > 'crush-compat' mode don't seem to apply.
>
> You should probably have used a tool like
>
> https://github.com/HeinleinSupport/cern-ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
> or
> https://github.com/digitalocean/pgremapper
>
> The idea behind those are that you set norebalance/nobackfill, then do
> your changes (like
> adding tons of new OSDs or major weight changes), then run these tools
> so that they tell
> the cluster via upmap that "the current situation is meant to be like
> this" via specific exceptions
> placed on each misplaced/remapped PG. This makes the cluster HEALTH_OK
> again.
>
> After this, you unset nobackfill,norebalance, but no movement starts
> since all PGs are where
> they are "supposed" to be, given the current upmap exceptions.
>
> Then, as the balancer (in upmap mode) runs, it will figure out that
> those PGs actually are placed
> on the wrong OSDs, but will do some 8 at a time, by just removing
> their upmap exceptions,
> and then the cluster moves a few PGs, and when those are done, the
> cluster is healthy
> again, scrubs can start and so on, then the balancer finds 8 new PGs
> to un-misplace and
> then it goes like this until all PGs are in their final positions.
>
> Lots less strain on the cluster, no long periods without scrubs, less
> operator interventions.
>
> Janne,

I will have a look at the tools you linked.  I had intended to anyway, but
your explanation helps me to understand the underlying objectives much
better.

However, I'm still concerned about the Balancer.  How can I tell if it was
running over the past couple months?  If it was, why did my OSDs load up so
unevenly?  I thought that's what the Balancer was supposed to prevent, so
either it hasn't been running or I've configured it incorrectly.

> --
> May the most significant bit of your life be positive.
>

Thanks.

-Dave
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx