Re: 2 OSDs Near Full, Others Under 50%

Janne Johansson <icepic.dz@xxxxxxxxx> · Fri, 29 Oct 2021 08:23:23 +0200

Den tors 28 okt. 2021 kl 22:25 skrev Dave Hall <kdhall@xxxxxxxxxxxxxx>:
> Hello,
> I have a Nautilus 14.2.21 cluster with 48 x 12TB OSDs across 6 nodes, with
> 3 new nodes and 24 more OSDs ready to come online.  The bulk of my pools
> are EC 8+2 with a failure domain of OSD.
> Until yesterday one of the original 48 OSDs had failed and been destroyed
> for a few months.  I was finally able to replace this OSD yesterday.
>
> As the backfilling started I noticed that the other 47 OSDs had gotten
> fairly out of balance somehow.  They range from 53 PGs and 46% full to 87
> PGs and 85% full.  I thought the Balancer would be taking care of this, but
> perhaps there is a problem with my settings.
>
> As of this morning I had 1 nearfull OSD.  As of now I have two.  Due to the
> rebuild OSD I still have 70 PGs to get remapped.  On the other hand, the
> rebuild OSD has been assigned 73 PGs, but so far it's only up to 7% full.
>
> From what I've been able to find, it looks like the Balancer won't run
> until after the backfilling is complete.   When I get there I'd like to
> have the right Balancer setting in place to improve the balance before I
> start introducing the new OSDs.
>
> Any advice or insight would be greatly appreciated.  In particular, I
> noticed that my Balancer mode was 'upmap'.  SInce all of my OSDs are the
> same and my crush-map is flat and uniform, recommendations against
> 'crush-compat' mode don't seem to apply.

You should probably have used a tool like
https://github.com/HeinleinSupport/cern-ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
or
https://github.com/digitalocean/pgremapper

The idea behind those are that you set norebalance/nobackfill, then do
your changes (like
adding tons of new OSDs or major weight changes), then run these tools
so that they tell
the cluster via upmap that "the current situation is meant to be like
this" via specific exceptions
placed on each misplaced/remapped PG. This makes the cluster HEALTH_OK again.

After this, you unset nobackfill,norebalance, but no movement starts
since all PGs are where
they are "supposed" to be, given the current upmap exceptions.

Then, as the balancer (in upmap mode) runs, it will figure out that
those PGs actually are placed
on the wrong OSDs, but will do some 8 at a time, by just removing
their upmap exceptions,
and then the cluster moves a few PGs, and when those are done, the
cluster is healthy
again, scrubs can start and so on, then the balancer finds 8 new PGs
to un-misplace and
then it goes like this until all PGs are in their final positions.

Lots less strain on the cluster, no long periods without scrubs, less
operator interventions.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx