Re: Balancing with upmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Actually we have no EC pools... all are replica 3. And we have only 9 pools.

The average number og pg/osd is not very high (40.6).

Here is the detail of the pools :

pool 2 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 623105 lfor 0/608315/608313 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 31 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 621529 lfor 0/0/171563 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 32 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 621529 lfor 436085/436085/436085 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 33 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 621529 lfor 0/0/171554 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 34 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 623470 lfor 0/0/171558 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 35 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 last_change 621529 lfor 0/598286/598284 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs pool 36 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 624174 flags hashpspool,selfmanaged_snaps stripe_width 0 application cephfs pool 43 replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 624174 flags hashpspool,selfmanaged_snaps stripe_width 0 application cephfs pool 44 replicated size 3 min_size 3 crush_rule 2 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 622177 lfor 0/0/449412 flags hashpspool,selfmanaged_snaps stripe_width 0 expected_num_objects 400 target_size_bytes 17592186044416 application rbd

Pools 35 (meta), 36 and 43 (datas) are for cephfs.

The point should be the crush rule. Indeed, as we have servers in 2 different rooms, we have a crush rule to ensure that at least one copy of the datas is stored in each room (for disaster recovery):

{
        "rule_id": 2,
        "rule_name": "replicated3over2rooms",
        "ruleset": 2,
        "type": 1,
        "min_size": 3,
        "max_size": 4,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "choose_firstn",
                "num": 0,
                "type": "room"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 2,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },

This rule should pick up a room, put 2 copies on different hosts in that room and put the third copy on any host in the second room.

I understand that it will not lead to a totally uniform repartition, but statistically it should not be too far.

The repartition of disks between rooms is the following : 4(servers)x16 disks of 8T in the first room and 1(server)x24 disks of 16 T + 1x16 + 1x12 disks of 8T in the second room.

This repartition is not homogeneous (4 servers in the first room and 3 in the second, 64 disks in a room and 52 in the second and disks of different capacity) and for sure we have an excess in capacity of 12x8T in the second room (I am aware that this capacity is "lost" for now... it will be usable in the future if we add some new servers in the first room).

But in theory (which I agree is generally far from reality) a rather balanced repartition of datas should be reached.

F.



Le 31/01/2021 à 17:30, Dan van der Ster a écrit :
Hi,

I think what's happening is that because you have few PGs and many
pools, the balancer cannot achieve a good uniform distribution.
The upmap balancer works to make the PGs uniform for each pool
individually -- it doesn't look at the total PGs per OSD, so perhaps
with your low # PGs per pool per OSD you are just unlucky.

You can use a script like this:
https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-pool-pg-distribution
to see the PG distribution for any given pool. E.g on one of my clusters:

# ./ceph-pool-pg-distribution 38
Searching for PGs in pools: ['38']
Summary: 32 pgs on 52 osds

Num OSDs with X PGs:
   1: 21
   2: 20
   3: 9
   4: 2

That shows a pretty non-uniform distribution, because this example
pool id 38 has up to 4 PGs on some OSDs but 1 or 2 on most.
(this is a cluster with the balancer disabled).

The other explanation I can think of is that you have relatively wide
EC pools and few hosts. In that case there would be very little that
the balancer could do to flatten the distribution.
If in doubt, please share your pool details and crush rules so we can
investigate further.

Cheers, Dan




On Sun, Jan 31, 2021 at 5:10 PM Francois Legrand <fleg@xxxxxxxxxxxxxx> wrote:
Hi,

After 2 days, the recovery ended. The situation is clearly better (but
still not perfect) with 339.8 Ti available in pools (for 575.8 Ti
available in the whole cluster).

The balancing remains not perfect (31 to 47 pgs on 8TB disks). And the
ceph osd df tree returns :

ID  CLASS WEIGHT     REWEIGHT SIZE     RAW USE DATA OMAP    META
AVAIL   %USE  VAR  PGS STATUS TYPE NAME
   -1       1018.65833        -  466 TiB 214 TiB 214 TiB 126 GiB 609 GiB
251 TiB     0    0   -        root default
-15        465.66577        -  466 TiB 214 TiB 214 TiB 126 GiB 609 GiB
251 TiB 46.04 1.06   -            room 1222-2-10
   -3        116.41678        -  116 TiB  53 TiB  53 TiB 24 GiB 152 GiB
64 TiB 45.45 1.05   -                host lpnceph01
    0   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.5 GiB  16 GiB
3.5 TiB 51.34 1.18  38     up osd.0
    4   hdd    7.27599  1.00000  7.3 TiB 3.2 TiB 3.2 TiB 2.4 GiB 8.7 GiB
4.1 TiB 44.12 1.01  36     up osd.4
    8   hdd    7.27699  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.3 GiB 9.3 GiB
3.7 TiB 48.52 1.12  39     up osd.8
   12   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.4 GiB 9.5 GiB
3.9 TiB 46.69 1.07  37     up osd.12
   16   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.4 TiB 38 MiB 9.7 GiB
3.8 TiB 47.49 1.09  37     up osd.16
   20   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.0 TiB 2.4 GiB 8.7 GiB
4.2 TiB 41.95 0.96  34     up osd.20
   24   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.3 GiB 9.8 GiB
3.8 TiB 48.45 1.11  38     up osd.24
   28   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 55 MiB 8.2 GiB
4.2 TiB 41.74 0.96  32     up osd.28
   32   hdd    7.27599  1.00000  7.3 TiB 3.2 TiB 3.1 TiB 32 MiB 8.4 GiB
4.1 TiB 43.33 1.00  34     up osd.32
   36   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.4 GiB  11 GiB
3.6 TiB 50.50 1.16  35     up osd.36
   40   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.3 TiB 2.4 GiB 9.1 GiB
3.9 TiB 46.15 1.06  37     up osd.40
   44   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.3 GiB 9.2 GiB
3.9 TiB 46.28 1.06  36     up osd.44
   48   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 92 MiB 8.8 GiB
4.0 TiB 44.88 1.03  33     up osd.48
   52   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 9.0 GiB
4.0 TiB 44.86 1.03  33     up osd.52
   56   hdd    7.27599  1.00000  7.3 TiB 2.9 TiB 2.9 TiB 23 MiB 8.3 GiB
4.4 TiB 39.79 0.92  34     up osd.56
   60   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 40 MiB 8.3 GiB
4.3 TiB 41.12 0.95  30     up osd.60
   -5        116.41600        -  116 TiB  54 TiB  54 TiB 30 GiB 150 GiB
63 TiB 46.12 1.06   -                host lpnceph02
    1   hdd    7.27599  1.00000  7.3 TiB 3.2 TiB 3.2 TiB 2.2 GiB 8.9 GiB
4.0 TiB 44.53 1.02  37     up osd.1
    5   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 24 MiB 8.3 GiB
4.2 TiB 42.56 0.98  34     up osd.5
    9   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 42 MiB  11 GiB
3.4 TiB 52.61 1.21  38     up osd.9
   13   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.3 GiB 9.7 GiB
4.2 TiB 42.89 0.99  36     up osd.13
   17   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.3 GiB 9.1 GiB
3.9 TiB 46.80 1.08  36     up osd.17
   21   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 41 MiB 9.2 GiB
4.0 TiB 44.90 1.03  33     up osd.21
   25   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.4 GiB 9.4 GiB
3.7 TiB 48.75 1.12  38     up osd.25
   29   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 2.3 GiB 8.7 GiB
4.2 TiB 41.91 0.96  34     up osd.29
   33   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.3 GiB 9.4 GiB
3.9 TiB 46.60 1.07  36     up osd.33
   37   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 4.6 GiB  10 GiB
3.8 TiB 47.90 1.10  34     up osd.37
   41   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.2 GiB  11 GiB
3.9 TiB 45.91 1.06  33     up osd.41
   45   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.4 GiB 9.3 GiB
3.9 TiB 46.85 1.08  35     up osd.45
   49   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.3 GiB 8.9 GiB
4.0 TiB 45.35 1.04  36     up osd.49
   53   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 36 MiB 9.0 GiB
4.0 TiB 44.85 1.03  33     up osd.53
   57   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.3 GiB 9.0 GiB
4.0 TiB 45.67 1.05  36     up osd.57
   61   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.6 TiB 2.4 GiB 9.8 GiB
3.7 TiB 49.75 1.14  36     up osd.61
   -9        116.41600        -  116 TiB  56 TiB  56 TiB 35 GiB 159 GiB
61 TiB 48.03 1.10   -                host lpnceph04
    7   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 9.4 GiB
3.9 TiB 45.96 1.06  37     up osd.7
   11   hdd    7.27599  1.00000  7.3 TiB 3.9 TiB 3.9 TiB 4.7 GiB  11 GiB
3.4 TiB 53.20 1.22  40     up osd.11
   15   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 2.3 GiB 9.8 GiB
3.5 TiB 51.84 1.19  40     up osd.15
   27   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.3 GiB 8.5 GiB
4.2 TiB 42.50 0.98  34     up osd.27
   31   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.2 GiB 8.7 GiB
4.2 TiB 42.61 0.98  35     up osd.31
   35   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.3 GiB  12 GiB
3.8 TiB 48.27 1.11  37     up osd.35
   39   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.6 TiB 2.2 GiB 8.4 GiB
3.7 TiB 49.45 1.14  36     up osd.39
   43   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 9.4 GiB
4.0 TiB 45.71 1.05  35     up osd.43
   47   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 3.0 GiB  12 GiB
3.5 TiB 52.31 1.20  41     up osd.47
   51   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.3 TiB 2.3 GiB  10 GiB
3.9 TiB 46.13 1.06  34     up osd.51
   55   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 4.1 GiB  11 GiB
4.0 TiB 45.71 1.05  35     up osd.55
   59   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 2.2 GiB  10 GiB
3.5 TiB 52.19 1.20  40     up osd.59
100   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 2.3 GiB  10 GiB
3.5 TiB 52.22 1.20  39     up osd.100
101   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 26 MiB 9.0 GiB
3.9 TiB 45.82 1.05  36     up osd.101
102   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 75 MiB 9.0 GiB
3.9 TiB 45.79 1.05  34     up osd.102
105   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.5 TiB 57 MiB 9.9 GiB
3.7 TiB 48.83 1.12  37     up osd.105
-13        116.41699        -  116 TiB  52 TiB  52 TiB 37 GiB 148 GiB
65 TiB 44.58 1.03   -                host lpnceph06
   19   hdd    7.27699  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.2 GiB 8.8 GiB
3.9 TiB 45.97 1.06  37     up osd.19
   72   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.5 TiB 2.6 GiB 9.4 GiB
3.7 TiB 48.84 1.12  39     up osd.72
   74   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.3 GiB 8.5 GiB
4.2 TiB 42.36 0.97  34     up osd.74
   75   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.4 GiB 8.6 GiB
4.2 TiB 42.85 0.99  36     up osd.75
   76   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.9 GiB 9.9 GiB
4.2 TiB 42.47 0.98  34     up osd.76
   77   hdd    7.27599  1.00000  7.3 TiB 3.2 TiB 3.2 TiB 2.4 GiB 8.7 GiB
4.1 TiB 44.34 1.02  35     up osd.77
   78   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 4.6 GiB  12 GiB
4.0 TiB 45.56 1.05  35     up osd.78
   79   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.4 GiB 8.4 GiB
4.2 TiB 42.94 0.99  35     up osd.79
   80   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.4 GiB 8.5 GiB
4.2 TiB 42.47 0.98  34     up osd.80
   81   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.6 TiB 2.3 GiB 9.1 GiB
3.7 TiB 48.99 1.13  38     up osd.81
   82   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 2.3 GiB 8.5 GiB
4.3 TiB 40.98 0.94  34     up osd.82
   83   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 22 MiB 8.3 GiB
4.3 TiB 41.03 0.94  33     up osd.83
   84   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.4 GiB  11 GiB
3.6 TiB 50.66 1.17  40     up osd.84
   85   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 9.4 GiB
4.0 TiB 45.66 1.05  34     up osd.85
   86   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 51 MiB 8.7 GiB
4.2 TiB 42.33 0.97  31     up osd.86
   87   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 3.3 GiB  10 GiB
3.9 TiB 45.77 1.05  34     up osd.87
-16        552.99255        -  349 TiB 124 TiB 123 TiB 65 GiB 321 GiB
225 TiB     0    0   -            room 1222-SS-09
-21                0        -      0 B     0 B     0 B     0 B     0
B     0 B     0    0   -                host lpnceph00
   -7        116.41600        -  116 TiB  60 TiB  60 TiB 35 GiB 176 GiB
56 TiB 51.73 1.19   -                host lpnceph03
    2   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 8.9 GiB
3.9 TiB 46.01 1.06  37     up osd.2
    6   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 4.8 GiB  16 GiB
3.8 TiB 47.26 1.09  37     up osd.6
   10   hdd    7.27599  1.00000  7.3 TiB 4.3 TiB 4.2 TiB 2.4 GiB  12 GiB
3.0 TiB 58.59 1.35  45     up osd.10
   14   hdd    7.27599  1.00000  7.3 TiB 3.9 TiB 3.9 TiB 2.4 GiB  12 GiB
3.4 TiB 53.62 1.23  40     up osd.14
   18   hdd    7.27599  1.00000  7.3 TiB 4.0 TiB 4.0 TiB 3.4 GiB  12 GiB
3.2 TiB 55.45 1.28  43     up osd.18
   22   hdd    7.27599  1.00000  7.3 TiB 4.5 TiB 4.5 TiB 2.2 GiB  12 GiB
2.8 TiB 61.64 1.42  46     up osd.22
   26   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.3 GiB  11 GiB
3.6 TiB 51.11 1.18  39     up osd.26
   30   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.6 TiB 2.4 GiB  10 GiB
3.6 TiB 50.23 1.16  39     up osd.30
   34   hdd    7.27599  1.00000  7.3 TiB 4.2 TiB 4.2 TiB 59 MiB  11 GiB
3.1 TiB 58.04 1.33  43     up osd.34
   38   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 44 MiB 9.8 GiB
3.5 TiB 51.86 1.19  38     up osd.38
   42   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.7 GiB  11 GiB
3.5 TiB 51.35 1.18  41     up osd.42
   46   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 3.0 GiB 9.3 GiB
4.0 TiB 45.60 1.05  35     up osd.46
   50   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.6 TiB 2.5 GiB  11 GiB
3.7 TiB 49.59 1.14  40     up osd.50
   54   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 54 MiB 9.8 GiB
3.7 TiB 48.78 1.12  35     up osd.54
   58   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.3 GiB 9.7 GiB
3.6 TiB 50.55 1.16  39     up osd.58
   62   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.4 GiB 9.2 GiB
3.8 TiB 48.00 1.10  40     up osd.62
-11                0        -      0 B     0 B     0 B     0 B     0
B     0 B     0    0   -                host lpnceph05
-19         87.31200        -   87 TiB  44 TiB  44 TiB 31 GiB 127 GiB
43 TiB 50.92 1.17   -                host lpnceph07
   89   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 4.2 GiB  11 GiB
3.9 TiB 46.67 1.07  38     up osd.89
   90   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.4 GiB 9.9 GiB
3.6 TiB 50.71 1.17  39     up osd.90
   91   hdd    7.27599  1.00000  7.3 TiB 4.4 TiB 4.4 TiB 2.4 GiB  11 GiB
2.9 TiB 60.10 1.38  47     up osd.91
   92   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 4.1 GiB  11 GiB
4.0 TiB 45.67 1.05  36     up osd.92
   93   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.2 GiB 9.3 GiB
3.9 TiB 46.63 1.07  38     up osd.93
   94   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.3 GiB 9.8 GiB
3.6 TiB 50.50 1.16  39     up osd.94
   95   hdd    7.27599  1.00000  7.3 TiB 4.1 TiB 4.1 TiB 2.3 GiB  11 GiB
3.2 TiB 56.33 1.30  44     up osd.95
   97   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 2.4 GiB  10 GiB
3.5 TiB 52.08 1.20  41     up osd.97
   98   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 3.2 GiB 9.4 GiB
4.2 TiB 42.94 0.99  34     up osd.98
   99   hdd    7.27599  1.00000  7.3 TiB 3.9 TiB 3.9 TiB 2.9 GiB  12 GiB
3.3 TiB 54.05 1.24  43     up osd.99
103   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.3 GiB 9.7 GiB
3.9 TiB 45.87 1.06  36     up osd.103
104   hdd    7.27599  1.00000  7.3 TiB 4.3 TiB 4.3 TiB 56 MiB  13 GiB
3.0 TiB 59.43 1.37  44     up osd.104
-23        349.26453        -  349 TiB 124 TiB 123 TiB 65 GiB 321 GiB
225 TiB 35.45 0.82   -                host lpnceph09
    3   hdd   14.55269  1.00000   15 TiB 5.3 TiB 5.3 TiB 4.0 GiB  14 GiB
9.2 TiB 36.65 0.84  58     up osd.3
   23   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.3 GiB  12 GiB
9.5 TiB 34.52 0.79  55     up osd.23
   63   hdd   14.55269  1.00000   15 TiB 5.2 TiB 5.2 TiB 2.4 GiB  13 GiB
9.3 TiB 35.98 0.83  55     up osd.63
   64   hdd   14.55269  1.00000   15 TiB 5.2 TiB 5.2 TiB 3.6 GiB  15 GiB
9.3 TiB 35.81 0.82  57     up osd.64
   65   hdd   14.55269  1.00000   15 TiB 5.7 TiB 5.7 TiB 4.6 GiB  16 GiB
8.8 TiB 39.41 0.91  56     up osd.65
   66   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.4 GiB  13 GiB
9.6 TiB 34.33 0.79  55     up osd.66
   67   hdd   14.55269  1.00000   15 TiB 5.1 TiB 5.1 TiB 2.4 GiB  13 GiB
9.4 TiB 35.31 0.81  56     up osd.67
   68   hdd   14.55269  1.00000   15 TiB 5.6 TiB 5.5 TiB 2.3 GiB  14 GiB
9.0 TiB 38.24 0.88  58     up osd.68
   69   hdd   14.55269  1.00000   15 TiB 5.9 TiB 5.8 TiB 2.3 GiB  15 GiB
8.7 TiB 40.30 0.93  58     up osd.69
   70   hdd   14.55269  1.00000   15 TiB 4.8 TiB 4.8 TiB 3.0 GiB  13 GiB
9.7 TiB 33.21 0.76  51     up osd.70
   71   hdd   14.55269  1.00000   15 TiB 5.2 TiB 5.2 TiB 2.2 GiB  13 GiB
9.4 TiB 35.74 0.82  57     up osd.71
   73   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.3 GiB  12 GiB
9.6 TiB 34.24 0.79  55     up osd.73
   88   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.3 GiB  12 GiB
9.5 TiB 34.61 0.80  51     up osd.88
   96   hdd   14.55269  1.00000   15 TiB 5.3 TiB 5.3 TiB 2.3 GiB  13 GiB
9.3 TiB 36.28 0.83  56     up osd.96
106   hdd   14.55269  1.00000   15 TiB 4.9 TiB 4.9 TiB 2.5 GiB  13 GiB
9.6 TiB 33.96 0.78  53     up osd.106
107   hdd   14.55269  1.00000   15 TiB 5.3 TiB 5.3 TiB 3.2 GiB  15 GiB
9.3 TiB 36.28 0.83  54     up osd.107
108   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.3 GiB  13 GiB
9.5 TiB 34.70 0.80  53     up osd.108
109   hdd   14.55269  1.00000   15 TiB 5.1 TiB 5.1 TiB 2.4 GiB  12 GiB
9.5 TiB 34.82 0.80  52     up osd.109
110   hdd   14.55269  1.00000   15 TiB 5.5 TiB 5.5 TiB 2.8 GiB  16 GiB
9.0 TiB 37.91 0.87  55     up osd.110
111   hdd   14.55269  1.00000   15 TiB 5.3 TiB 5.3 TiB 3.2 GiB  14 GiB
9.3 TiB 36.35 0.84  55     up osd.111
112   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.9 GiB  14 GiB
9.6 TiB 34.18 0.79  55     up osd.112
113   hdd   14.55269  1.00000   15 TiB 4.6 TiB 4.6 TiB 2.3 GiB  12 GiB
10 TiB 31.47 0.72  48     up osd.113
114   hdd   14.55269  1.00000   15 TiB 5.0 TiB 4.9 TiB 3.3 GiB  13 GiB
9.6 TiB 34.07 0.78  53     up osd.114
115   hdd   14.55269  1.00000   15 TiB 4.7 TiB 4.7 TiB 2.3 GiB  12 GiB
9.8 TiB 32.47 0.75  51     up osd.115
                          TOTAL 1019 TiB 443 TiB 441 TiB 258 GiB 1.2 TiB
576 TiB 43.48
MIN/MAX VAR: 0.72/1.42  STDDEV: 6.69


and ceph balancer status
{
      "last_optimize_duration": "0:00:02.223977",
      "plans": [],
      "mode": "upmap",
      "active": true,
      "optimize_result": "Unable to find further optimization, or pool(s)
pg_num is decreasing, or distribution is already perfect",
      "last_optimize_started": "Sun Jan 31 17:07:47 2021"
}

Can the crush rules for placement be blamed for the inequal repartition ?

F.

Le 29/01/2021 à 23:44, Dan van der Ster a écrit :
Thanks, and thanks for the log file OTR which simply showed:

      2021-01-29 23:17:32.567 7f6155cae700  4 mgr[balancer] prepared 0/10 changes

This indeed means that balancer believes those pools are all balanced
according to the config (which you have set to the defaults).

Could you please also share the output of `ceph osd df tree` so we can
see the distribution and OSD weights?

You might need simply to decrease the upmap_max_deviation from the
default of 5. On our clusters we do:

      ceph config set mgr mgr/balancer/upmap_max_deviation 1

Cheers, Dan

On Fri, Jan 29, 2021 at 11:25 PM Francois Legrand <fleg@xxxxxxxxxxxxxx> wrote:
Hi Dan,

Here is the output of ceph balancer status :

/ceph balancer status//
//{//
//    "last_optimize_duration": "0:00:00.074965", //
//    "plans": [], //
//    "mode": "upmap", //
//    "active": true, //
//    "optimize_result": "Unable to find further optimization, or
pool(s) pg_num is decreasing, or distribution is already perfect", //
//    "last_optimize_started": "Fri Jan 29 23:13:31 2021"//
//}/


F.

Le 29/01/2021 à 10:57, Dan van der Ster a écrit :
Hi Francois,

What is the output of `ceph balancer status` ?
Also, can you increase the debug_mgr to 4/5 then share the log file of
the active mgr?

Best,

Dan

On Fri, Jan 29, 2021 at 10:54 AM Francois Legrand <fleg@xxxxxxxxxxxxxx> wrote:
Thanks for your suggestion. I will have a look !

But I am a bit surprised that the "official" balancer seems so unefficient !

F.

Le 28/01/2021 à 12:00, Jonas Jelten a écrit :
Hi!

We also suffer heavily from this so I wrote a custom balancer which yields much better results:
https://github.com/TheJJ/ceph-balancer

After you run it, it echoes the PG movements it suggests. You can then just run those commands the cluster will balance more.
It's kinda work in progress, so I'm glad about your feedback.

Maybe it helps you :)

-- Jonas

On 27/01/2021 17.15, Francois Legrand wrote:
Hi all,
I have a cluster with 116 disks (24 new disks of 16TB added in december and the rest of 8TB) running nautilus 14.2.16.
I moved (8 month ago) from crush_compat to upmap balancing.
But the cluster seems not well balanced, with a number of pgs on the 8TB disks varying from 26 to 52 ! And an occupation from 35 to 69%.
The recent 16 TB disks are more homogeneous with 48 to 61 pgs and space between 30 and 43%.
Last week, I realized that some osd were maybe not using upmap because I did a ceph osd crush weight-set ls and got (compat) as result.
Thus I ran a ceph osd crush weight-set rm-compat which triggered some rebalancing. Now there is no more recovery for 2 days, but the cluster is still unbalanced.
As far as I understand, upmap is supposed to reach an equal number of pgs on all the disks (I guess weighted by their capacity).
Thus I would expect more or less 30 pgs on the 8TB disks and 60 on the 16TB and around 50% usage on all. Which is not the case (by far).
The problem is that it impact the free available space in the pools (264Ti while there is more than 578Ti free in the cluster) because free space seems to be based on space available before the first osd will be full !
Is it normal ? Did I missed something ? What could I do ?

F.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux