Den tors 28 okt. 2021 kl 22:25 skrev Dave Hall <kdhall@xxxxxxxxxxxxxx>: > Hello, > I have a Nautilus 14.2.21 cluster with 48 x 12TB OSDs across 6 nodes, with > 3 new nodes and 24 more OSDs ready to come online. The bulk of my pools > are EC 8+2 with a failure domain of OSD. > Until yesterday one of the original 48 OSDs had failed and been destroyed > for a few months. I was finally able to replace this OSD yesterday. > > As the backfilling started I noticed that the other 47 OSDs had gotten > fairly out of balance somehow. They range from 53 PGs and 46% full to 87 > PGs and 85% full. I thought the Balancer would be taking care of this, but > perhaps there is a problem with my settings. > > As of this morning I had 1 nearfull OSD. As of now I have two. Due to the > rebuild OSD I still have 70 PGs to get remapped. On the other hand, the > rebuild OSD has been assigned 73 PGs, but so far it's only up to 7% full. > > From what I've been able to find, it looks like the Balancer won't run > until after the backfilling is complete. When I get there I'd like to > have the right Balancer setting in place to improve the balance before I > start introducing the new OSDs. > > Any advice or insight would be greatly appreciated. In particular, I > noticed that my Balancer mode was 'upmap'. SInce all of my OSDs are the > same and my crush-map is flat and uniform, recommendations against > 'crush-compat' mode don't seem to apply. You should probably have used a tool like https://github.com/HeinleinSupport/cern-ceph-scripts/blob/master/tools/upmap/upmap-remapped.py or https://github.com/digitalocean/pgremapper The idea behind those are that you set norebalance/nobackfill, then do your changes (like adding tons of new OSDs or major weight changes), then run these tools so that they tell the cluster via upmap that "the current situation is meant to be like this" via specific exceptions placed on each misplaced/remapped PG. This makes the cluster HEALTH_OK again. After this, you unset nobackfill,norebalance, but no movement starts since all PGs are where they are "supposed" to be, given the current upmap exceptions. Then, as the balancer (in upmap mode) runs, it will figure out that those PGs actually are placed on the wrong OSDs, but will do some 8 at a time, by just removing their upmap exceptions, and then the cluster moves a few PGs, and when those are done, the cluster is healthy again, scrubs can start and so on, then the balancer finds 8 new PGs to un-misplace and then it goes like this until all PGs are in their final positions. Lots less strain on the cluster, no long periods without scrubs, less operator interventions. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx