Re: Best Practice for OSD Balancing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> 
> I'm fairly new to Ceph and running Rook on a fairly small cluster
> (half a dozen nodes, about 15 OSDs).

Very small and/or non-uniform clusters can be corner cases for many things, especially if they don’t have enough PGs.  What is your failure domain — host or OSD?

Are your OSDs sized uniformly?  Please send the output of the following commands:

`ceph osd tree`

so that we can see the topology.

`ceph -s`
`ceph osd df`
`ceph osd dump | grep pool`
`ceph balancer status`
`ceph osd pool autoscale-status`


>  I notice that OSD space use can
> vary quite a bit - upwards of 10-20%.
> 
> In the documentation I see multiple ways of managing this, but no
> guidance on what the "correct" or best way to go about this is.

Assuming that you’re running a recent release, and that the balancer module is enabled, that *should* be the right way.

The balancer module can be confounded by certain complex topologies like multiple device classes and/or CRUSH roots.

Since you’re using Rook, I wonder if you might be hitting something that I’ve seen myself; the above commands will tell the tale.


>  As far as I can tell there is the balancer, manual manipulation of upmaps
> via the command line tools, and OSD reweight.  The last two can be
> optimized with tools to calculate appropriate corrections.  There is
> also the new read/active upmap (at least for non-EC pools), which is
> manually triggered.
> 
> The balancer alone is leaving fairly wide deviations in space use, and
> at times during recovery this can become more significant.  I've seen
> OSDs hit the 80% threshold and start impacting IO when the entire
> cluster is only 50-60% full during recovery.
> 
> I've started using ceph osd reweight-by-utilization and that seems
> much more effective at balancing things, but this seems redundant with
> the balancer which I have turned on.

RBU was widely used before the balancer module.  I personally haven’t had to use it since Luminous.  It adjusts the override reweights, which will content with the balancer module if both are enabled.

There’s an alternative “JJ Balancer” out on the net that some report success with, but let’s see what your cluster looks like before we go there.


> 
> What is generally considered the best practice for OSD balancing?
> 
> --
> Rich
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux