2 OSDs Near Full, Others Under 50%

Dave Hall <kdhall@xxxxxxxxxxxxxx> · Thu, 28 Oct 2021 16:25:30 -0400

Hello,

I have a Nautilus 14.2.21 cluster with 48 x 12TB OSDs across 6 nodes, with
3 new nodes and 24 more OSDs ready to come online.  The bulk of my pools
are EC 8+2 with a failure domain of OSD.

Until yesterday one of the original 48 OSDs had failed and been destroyed
for a few months.  I was finally able to replace this OSD yesterday.

As the backfilling started I noticed that the other 47 OSDs had gotten
fairly out of balance somehow.  They range from 53 PGs and 46% full to 87
PGs and 85% full.  I thought the Balancer would be taking care of this, but
perhaps there is a problem with my settings.

As of this morning I had 1 nearfull OSD.  As of now I have two.  Due to the
rebuild OSD I still have 70 PGs to get remapped.  On the other hand, the
rebuild OSD has been assigned 73 PGs, but so far it's only up to 7% full.

>From what I've been able to find, it looks like the Balancer won't run
until after the backfilling is complete.   When I get there I'd like to
have the right Balancer setting in place to improve the balance before I
start introducing the new OSDs.

Any advice or insight would be greatly appreciated.  In particular, I
noticed that my Balancer mode was 'upmap'.  SInce all of my OSDs are the
same and my crush-map is flat and uniform, recommendations against
'crush-compat' mode don't seem to apply.

Also, are there any common issues or configuration mistakes that would
cause the Balancer not to run, and is there a specific Balancer log
somewhere that I could analyse?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx