compat weight reset

Reed Dier <reed.dier@xxxxxxxxxxx> · Fri, 2 Aug 2019 14:47:29 -0500

Hi all,
I am trying to find a simple way that might help me better distribute my data, as I wrap up my Nautilus upgrades.

Currently rebuilding some OSD's with bigger block.db to prevent BlueFS spillover where it isn't difficult to do so, and I'm once again struggling with unbalanced distribution, despite having used upmap balancer.

I recently discovered that my previous usage of the balancer module with crush-compat mode before the upmap mode has left some lingering compat weight sets, which I believe may account for my less than stellar distribution, as I now have 2-3 weightings fighting against each other (upmap balancer, compat weight set, reweight). Below is a snippet showing the compat differing.

$ ceph osd crush tree
ID  CLASS WEIGHT    (compat)  TYPE NAME
-55        43.70700  42.70894         chassis node2425
 -2        21.85399  20.90097             host node24
  0   hdd   7.28499   7.75699                 osd.0
  8   hdd   7.28499   6.85500                 osd.8
 16   hdd   7.28499   6.28899                 osd.16
 -3        21.85399  21.80797             host node25
  1   hdd   7.28499   7.32899                 osd.1
  9   hdd   7.28499   7.24399                 osd.9
 17   hdd   7.28499   7.23499                 osd.17

So my main question is how do I [re]set the compat value, to match the weight, so that the upmap balancer can more precisely balance the data?

It looks like I may have two options, with 
ceph osd crush weight-set reweight-compat {name} {weight}
or
ceph osd crush weight-set rm-compat

I assume the first would be to manage a single device/host/chassis/etc and the latter would nuke all compat values across the board?

And in looking at this, I started poking at my tunables, and I have no clue how to interpret the values, nor what I believe what they should be.

$ ceph osd crush show-tunables
{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 0,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 22,
    "profile": "firefly",
    "optimal_tunables": 0,
    "legacy_tunables": 0,
    "minimum_required_version": "hammer",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 0,
    "require_feature_tunables3": 1,
    "has_v3_rules": 0,
    "has_v4_buckets": 1,
    "require_feature_tunables5": 0,
    "has_v5_rules": 0
}

This is a Jewel -> Luminous -> Mimic -> Nautilus cluster, and pretty much all the clients support Jewel/Luminous+ feature sets (jewel clients are kernel-cephfs clients, even though recent (4.15-4.18) kernels).
$ ceph features | grep release
            "release": "luminous",
            "release": "luminous",
            "release": "luminous",
            "release": "jewel",
            "release": "jewel",
            "release": "luminous",
            "release": "luminous",
            "release": "luminous",
            "release": "luminous",

I feel like I should be running optimal tunables, but I believe I am running default?
Not sure how much of a difference exists there, or if that will trigger a bunch of data movement either.

Hopefully someone will be able to steer me in a positive direction here, and I can mostly trigger a single, large data movement and return to a happy, balanced cluster once again.

Thanks,

Reed
Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com