On Fri, Oct 29, 2021 at 2:23 AM Janne Johansson <icepic.dz@xxxxxxxxx> wrote: > Den tors 28 okt. 2021 kl 22:25 skrev Dave Hall <kdhall@xxxxxxxxxxxxxx>: > > Hello, > > I have a Nautilus 14.2.21 cluster with 48 x 12TB OSDs across 6 nodes, > with > > 3 new nodes and 24 more OSDs ready to come online. The bulk of my pools > > are EC 8+2 with a failure domain of OSD. > > Until yesterday one of the original 48 OSDs had failed and been destroyed > > for a few months. I was finally able to replace this OSD yesterday. > > > > As the backfilling started I noticed that the other 47 OSDs had gotten > > fairly out of balance somehow. They range from 53 PGs and 46% full to 87 > > PGs and 85% full. I thought the Balancer would be taking care of this, > but > > perhaps there is a problem with my settings. > > > > As of this morning I had 1 nearfull OSD. As of now I have two. Due to > the > > rebuild OSD I still have 70 PGs to get remapped. On the other hand, the > > rebuild OSD has been assigned 73 PGs, but so far it's only up to 7% full. > > > > From what I've been able to find, it looks like the Balancer won't run > > until after the backfilling is complete. When I get there I'd like to > > have the right Balancer setting in place to improve the balance before I > > start introducing the new OSDs. > > > > Any advice or insight would be greatly appreciated. In particular, I > > noticed that my Balancer mode was 'upmap'. SInce all of my OSDs are the > > same and my crush-map is flat and uniform, recommendations against > > 'crush-compat' mode don't seem to apply. > > You should probably have used a tool like > > https://github.com/HeinleinSupport/cern-ceph-scripts/blob/master/tools/upmap/upmap-remapped.py > or > https://github.com/digitalocean/pgremapper > > The idea behind those are that you set norebalance/nobackfill, then do > your changes (like > adding tons of new OSDs or major weight changes), then run these tools > so that they tell > the cluster via upmap that "the current situation is meant to be like > this" via specific exceptions > placed on each misplaced/remapped PG. This makes the cluster HEALTH_OK > again. > > After this, you unset nobackfill,norebalance, but no movement starts > since all PGs are where > they are "supposed" to be, given the current upmap exceptions. > > Then, as the balancer (in upmap mode) runs, it will figure out that > those PGs actually are placed > on the wrong OSDs, but will do some 8 at a time, by just removing > their upmap exceptions, > and then the cluster moves a few PGs, and when those are done, the > cluster is healthy > again, scrubs can start and so on, then the balancer finds 8 new PGs > to un-misplace and > then it goes like this until all PGs are in their final positions. > > Lots less strain on the cluster, no long periods without scrubs, less > operator interventions. > > Janne, I will have a look at the tools you linked. I had intended to anyway, but your explanation helps me to understand the underlying objectives much better. However, I'm still concerned about the Balancer. How can I tell if it was running over the past couple months? If it was, why did my OSDs load up so unevenly? I thought that's what the Balancer was supposed to prevent, so either it hasn't been running or I've configured it incorrectly. > -- > May the most significant bit of your life be positive. > Thanks. -Dave _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx