Re: Balancing with upmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 1, 2021 at 10:03 AM Francois Legrand <fleg@xxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> Actually we have no EC pools... all are replica 3. And we have only 9 pools.
>
> The average number og pg/osd is not very high (40.6).
>
> Here is the detail of the pools :
>
> pool 2 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins
> pg_num 64 pgp_num 64 last_change 623105 lfor 0/608315/608313 flags
> hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> pool 31 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins
> pg_num 32 pgp_num 32 autoscale_mode on last_change 621529 lfor
> 0/0/171563 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> pool 32 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins
> pg_num 32 pgp_num 32 autoscale_mode on last_change 621529 lfor
> 436085/436085/436085 flags hashpspool,selfmanaged_snaps stripe_width 0
> application rbd
> pool 33 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins
> pg_num 32 pgp_num 32 autoscale_mode on last_change 621529 lfor
> 0/0/171554 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> pool 34 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins
> pg_num 32 pgp_num 32 autoscale_mode on last_change 623470 lfor
> 0/0/171558 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> pool 35 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins
> pg_num 32 pgp_num 32 last_change 621529 lfor 0/598286/598284 flags
> hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16
> recovery_priority 5 application cephfs
> pool 36 replicated size 3 min_size 1 crush_rule 2 object_hash rjenkins
> pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 624174 flags
> hashpspool,selfmanaged_snaps stripe_width 0 application cephfs
> pool 43 replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins
> pg_num 64 pgp_num 64 autoscale_mode warn last_change 624174 flags
> hashpspool,selfmanaged_snaps stripe_width 0 application cephfs
> pool 44 replicated size 3 min_size 3 crush_rule 2 object_hash rjenkins
> pg_num 256 pgp_num 256 autoscale_mode warn last_change 622177 lfor
> 0/0/449412 flags hashpspool,selfmanaged_snaps stripe_width 0
> expected_num_objects 400 target_size_bytes 17592186044416 application rbd
>
> Pools 35 (meta), 36 and 43 (datas) are for cephfs.
>

How does the distribution for pool 36 look? This pool has the best
chance to be balanced -- the others have too few PGs so you shouldn't
even be worried.

> The point should be the crush rule. Indeed, as we have servers in 2
> different rooms, we have a crush rule to ensure that at least one copy
> of the datas is stored in each room (for disaster recovery):
>
> {
>          "rule_id": 2,
>          "rule_name": "replicated3over2rooms",
>          "ruleset": 2,
>          "type": 1,
>          "min_size": 3,
>          "max_size": 4,
>          "steps": [
>              {
>                  "op": "take",
>                  "item": -1,
>                  "item_name": "default"
>              },
>              {
>                  "op": "choose_firstn",
>                  "num": 0,
>                  "type": "room"
>              },
>              {
>                  "op": "chooseleaf_firstn",
>                  "num": 2,
>                  "type": "host"
>              },
>              {
>                  "op": "emit"
>              }
>          ]
>      },
>
> This rule should pick up a room, put 2 copies on different hosts in that
> room and put the third copy on any host in the second room.
>
> I understand that it will not lead to a totally uniform repartition, but
> statistically it should not be too far.
>
> The repartition of disks between rooms is the following : 4(servers)x16
> disks of 8T in the first room and 1(server)x24 disks of 16 T + 1x16 +
> 1x12 disks of 8T in the second room.
>
> This repartition is not homogeneous (4 servers in the first room and 3
> in the second, 64 disks in a room and 52 in the second and disks of
> different capacity) and for sure we have an excess in capacity of 12x8T
> in the second room (I am aware that this capacity is "lost" for now...
> it will be usable in the future if we add some new servers in the first
> room).

This non trivial crush rule and "tree imbalance" is probably confusing
the balancer a lot.

-- dan

P.S. min_size 1 will lead to tears down the road....

>
> But in theory (which I agree is generally far from reality) a rather
> balanced repartition of datas should be reached.
>
> F.
>
>
>
> Le 31/01/2021 à 17:30, Dan van der Ster a écrit :
> > Hi,
> >
> > I think what's happening is that because you have few PGs and many
> > pools, the balancer cannot achieve a good uniform distribution.
> > The upmap balancer works to make the PGs uniform for each pool
> > individually -- it doesn't look at the total PGs per OSD, so perhaps
> > with your low # PGs per pool per OSD you are just unlucky.
> >
> > You can use a script like this:
> > https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-pool-pg-distribution
> > to see the PG distribution for any given pool. E.g on one of my clusters:
> >
> > # ./ceph-pool-pg-distribution 38
> > Searching for PGs in pools: ['38']
> > Summary: 32 pgs on 52 osds
> >
> > Num OSDs with X PGs:
> >    1: 21
> >    2: 20
> >    3: 9
> >    4: 2
> >
> > That shows a pretty non-uniform distribution, because this example
> > pool id 38 has up to 4 PGs on some OSDs but 1 or 2 on most.
> > (this is a cluster with the balancer disabled).
> >
> > The other explanation I can think of is that you have relatively wide
> > EC pools and few hosts. In that case there would be very little that
> > the balancer could do to flatten the distribution.
> > If in doubt, please share your pool details and crush rules so we can
> > investigate further.
> >
> > Cheers, Dan
> >
> >
> >
> >
> > On Sun, Jan 31, 2021 at 5:10 PM Francois Legrand <fleg@xxxxxxxxxxxxxx> wrote:
> >> Hi,
> >>
> >> After 2 days, the recovery ended. The situation is clearly better (but
> >> still not perfect) with 339.8 Ti available in pools (for 575.8 Ti
> >> available in the whole cluster).
> >>
> >> The balancing remains not perfect (31 to 47 pgs on 8TB disks). And the
> >> ceph osd df tree returns :
> >>
> >> ID  CLASS WEIGHT     REWEIGHT SIZE     RAW USE DATA OMAP    META
> >> AVAIL   %USE  VAR  PGS STATUS TYPE NAME
> >>    -1       1018.65833        -  466 TiB 214 TiB 214 TiB 126 GiB 609 GiB
> >> 251 TiB     0    0   -        root default
> >> -15        465.66577        -  466 TiB 214 TiB 214 TiB 126 GiB 609 GiB
> >> 251 TiB 46.04 1.06   -            room 1222-2-10
> >>    -3        116.41678        -  116 TiB  53 TiB  53 TiB 24 GiB 152 GiB
> >> 64 TiB 45.45 1.05   -                host lpnceph01
> >>     0   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.5 GiB  16 GiB
> >> 3.5 TiB 51.34 1.18  38     up osd.0
> >>     4   hdd    7.27599  1.00000  7.3 TiB 3.2 TiB 3.2 TiB 2.4 GiB 8.7 GiB
> >> 4.1 TiB 44.12 1.01  36     up osd.4
> >>     8   hdd    7.27699  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.3 GiB 9.3 GiB
> >> 3.7 TiB 48.52 1.12  39     up osd.8
> >>    12   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.4 GiB 9.5 GiB
> >> 3.9 TiB 46.69 1.07  37     up osd.12
> >>    16   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.4 TiB 38 MiB 9.7 GiB
> >> 3.8 TiB 47.49 1.09  37     up osd.16
> >>    20   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.0 TiB 2.4 GiB 8.7 GiB
> >> 4.2 TiB 41.95 0.96  34     up osd.20
> >>    24   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.3 GiB 9.8 GiB
> >> 3.8 TiB 48.45 1.11  38     up osd.24
> >>    28   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 55 MiB 8.2 GiB
> >> 4.2 TiB 41.74 0.96  32     up osd.28
> >>    32   hdd    7.27599  1.00000  7.3 TiB 3.2 TiB 3.1 TiB 32 MiB 8.4 GiB
> >> 4.1 TiB 43.33 1.00  34     up osd.32
> >>    36   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.4 GiB  11 GiB
> >> 3.6 TiB 50.50 1.16  35     up osd.36
> >>    40   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.3 TiB 2.4 GiB 9.1 GiB
> >> 3.9 TiB 46.15 1.06  37     up osd.40
> >>    44   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.3 GiB 9.2 GiB
> >> 3.9 TiB 46.28 1.06  36     up osd.44
> >>    48   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 92 MiB 8.8 GiB
> >> 4.0 TiB 44.88 1.03  33     up osd.48
> >>    52   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 9.0 GiB
> >> 4.0 TiB 44.86 1.03  33     up osd.52
> >>    56   hdd    7.27599  1.00000  7.3 TiB 2.9 TiB 2.9 TiB 23 MiB 8.3 GiB
> >> 4.4 TiB 39.79 0.92  34     up osd.56
> >>    60   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 40 MiB 8.3 GiB
> >> 4.3 TiB 41.12 0.95  30     up osd.60
> >>    -5        116.41600        -  116 TiB  54 TiB  54 TiB 30 GiB 150 GiB
> >> 63 TiB 46.12 1.06   -                host lpnceph02
> >>     1   hdd    7.27599  1.00000  7.3 TiB 3.2 TiB 3.2 TiB 2.2 GiB 8.9 GiB
> >> 4.0 TiB 44.53 1.02  37     up osd.1
> >>     5   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 24 MiB 8.3 GiB
> >> 4.2 TiB 42.56 0.98  34     up osd.5
> >>     9   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 42 MiB  11 GiB
> >> 3.4 TiB 52.61 1.21  38     up osd.9
> >>    13   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.3 GiB 9.7 GiB
> >> 4.2 TiB 42.89 0.99  36     up osd.13
> >>    17   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.3 GiB 9.1 GiB
> >> 3.9 TiB 46.80 1.08  36     up osd.17
> >>    21   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 41 MiB 9.2 GiB
> >> 4.0 TiB 44.90 1.03  33     up osd.21
> >>    25   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.4 GiB 9.4 GiB
> >> 3.7 TiB 48.75 1.12  38     up osd.25
> >>    29   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 2.3 GiB 8.7 GiB
> >> 4.2 TiB 41.91 0.96  34     up osd.29
> >>    33   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.3 GiB 9.4 GiB
> >> 3.9 TiB 46.60 1.07  36     up osd.33
> >>    37   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 4.6 GiB  10 GiB
> >> 3.8 TiB 47.90 1.10  34     up osd.37
> >>    41   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.2 GiB  11 GiB
> >> 3.9 TiB 45.91 1.06  33     up osd.41
> >>    45   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.4 GiB 9.3 GiB
> >> 3.9 TiB 46.85 1.08  35     up osd.45
> >>    49   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.3 GiB 8.9 GiB
> >> 4.0 TiB 45.35 1.04  36     up osd.49
> >>    53   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 36 MiB 9.0 GiB
> >> 4.0 TiB 44.85 1.03  33     up osd.53
> >>    57   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.3 GiB 9.0 GiB
> >> 4.0 TiB 45.67 1.05  36     up osd.57
> >>    61   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.6 TiB 2.4 GiB 9.8 GiB
> >> 3.7 TiB 49.75 1.14  36     up osd.61
> >>    -9        116.41600        -  116 TiB  56 TiB  56 TiB 35 GiB 159 GiB
> >> 61 TiB 48.03 1.10   -                host lpnceph04
> >>     7   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 9.4 GiB
> >> 3.9 TiB 45.96 1.06  37     up osd.7
> >>    11   hdd    7.27599  1.00000  7.3 TiB 3.9 TiB 3.9 TiB 4.7 GiB  11 GiB
> >> 3.4 TiB 53.20 1.22  40     up osd.11
> >>    15   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 2.3 GiB 9.8 GiB
> >> 3.5 TiB 51.84 1.19  40     up osd.15
> >>    27   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.3 GiB 8.5 GiB
> >> 4.2 TiB 42.50 0.98  34     up osd.27
> >>    31   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.2 GiB 8.7 GiB
> >> 4.2 TiB 42.61 0.98  35     up osd.31
> >>    35   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.3 GiB  12 GiB
> >> 3.8 TiB 48.27 1.11  37     up osd.35
> >>    39   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.6 TiB 2.2 GiB 8.4 GiB
> >> 3.7 TiB 49.45 1.14  36     up osd.39
> >>    43   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 9.4 GiB
> >> 4.0 TiB 45.71 1.05  35     up osd.43
> >>    47   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 3.0 GiB  12 GiB
> >> 3.5 TiB 52.31 1.20  41     up osd.47
> >>    51   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.3 TiB 2.3 GiB  10 GiB
> >> 3.9 TiB 46.13 1.06  34     up osd.51
> >>    55   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 4.1 GiB  11 GiB
> >> 4.0 TiB 45.71 1.05  35     up osd.55
> >>    59   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 2.2 GiB  10 GiB
> >> 3.5 TiB 52.19 1.20  40     up osd.59
> >> 100   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 2.3 GiB  10 GiB
> >> 3.5 TiB 52.22 1.20  39     up osd.100
> >> 101   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 26 MiB 9.0 GiB
> >> 3.9 TiB 45.82 1.05  36     up osd.101
> >> 102   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 75 MiB 9.0 GiB
> >> 3.9 TiB 45.79 1.05  34     up osd.102
> >> 105   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.5 TiB 57 MiB 9.9 GiB
> >> 3.7 TiB 48.83 1.12  37     up osd.105
> >> -13        116.41699        -  116 TiB  52 TiB  52 TiB 37 GiB 148 GiB
> >> 65 TiB 44.58 1.03   -                host lpnceph06
> >>    19   hdd    7.27699  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.2 GiB 8.8 GiB
> >> 3.9 TiB 45.97 1.06  37     up osd.19
> >>    72   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.5 TiB 2.6 GiB 9.4 GiB
> >> 3.7 TiB 48.84 1.12  39     up osd.72
> >>    74   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.3 GiB 8.5 GiB
> >> 4.2 TiB 42.36 0.97  34     up osd.74
> >>    75   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.4 GiB 8.6 GiB
> >> 4.2 TiB 42.85 0.99  36     up osd.75
> >>    76   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.9 GiB 9.9 GiB
> >> 4.2 TiB 42.47 0.98  34     up osd.76
> >>    77   hdd    7.27599  1.00000  7.3 TiB 3.2 TiB 3.2 TiB 2.4 GiB 8.7 GiB
> >> 4.1 TiB 44.34 1.02  35     up osd.77
> >>    78   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 4.6 GiB  12 GiB
> >> 4.0 TiB 45.56 1.05  35     up osd.78
> >>    79   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.4 GiB 8.4 GiB
> >> 4.2 TiB 42.94 0.99  35     up osd.79
> >>    80   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 2.4 GiB 8.5 GiB
> >> 4.2 TiB 42.47 0.98  34     up osd.80
> >>    81   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.6 TiB 2.3 GiB 9.1 GiB
> >> 3.7 TiB 48.99 1.13  38     up osd.81
> >>    82   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 2.3 GiB 8.5 GiB
> >> 4.3 TiB 40.98 0.94  34     up osd.82
> >>    83   hdd    7.27599  1.00000  7.3 TiB 3.0 TiB 3.0 TiB 22 MiB 8.3 GiB
> >> 4.3 TiB 41.03 0.94  33     up osd.83
> >>    84   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.4 GiB  11 GiB
> >> 3.6 TiB 50.66 1.17  40     up osd.84
> >>    85   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 9.4 GiB
> >> 4.0 TiB 45.66 1.05  34     up osd.85
> >>    86   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 51 MiB 8.7 GiB
> >> 4.2 TiB 42.33 0.97  31     up osd.86
> >>    87   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 3.3 GiB  10 GiB
> >> 3.9 TiB 45.77 1.05  34     up osd.87
> >> -16        552.99255        -  349 TiB 124 TiB 123 TiB 65 GiB 321 GiB
> >> 225 TiB     0    0   -            room 1222-SS-09
> >> -21                0        -      0 B     0 B     0 B     0 B     0
> >> B     0 B     0    0   -                host lpnceph00
> >>    -7        116.41600        -  116 TiB  60 TiB  60 TiB 35 GiB 176 GiB
> >> 56 TiB 51.73 1.19   -                host lpnceph03
> >>     2   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.4 GiB 8.9 GiB
> >> 3.9 TiB 46.01 1.06  37     up osd.2
> >>     6   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 4.8 GiB  16 GiB
> >> 3.8 TiB 47.26 1.09  37     up osd.6
> >>    10   hdd    7.27599  1.00000  7.3 TiB 4.3 TiB 4.2 TiB 2.4 GiB  12 GiB
> >> 3.0 TiB 58.59 1.35  45     up osd.10
> >>    14   hdd    7.27599  1.00000  7.3 TiB 3.9 TiB 3.9 TiB 2.4 GiB  12 GiB
> >> 3.4 TiB 53.62 1.23  40     up osd.14
> >>    18   hdd    7.27599  1.00000  7.3 TiB 4.0 TiB 4.0 TiB 3.4 GiB  12 GiB
> >> 3.2 TiB 55.45 1.28  43     up osd.18
> >>    22   hdd    7.27599  1.00000  7.3 TiB 4.5 TiB 4.5 TiB 2.2 GiB  12 GiB
> >> 2.8 TiB 61.64 1.42  46     up osd.22
> >>    26   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.3 GiB  11 GiB
> >> 3.6 TiB 51.11 1.18  39     up osd.26
> >>    30   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.6 TiB 2.4 GiB  10 GiB
> >> 3.6 TiB 50.23 1.16  39     up osd.30
> >>    34   hdd    7.27599  1.00000  7.3 TiB 4.2 TiB 4.2 TiB 59 MiB  11 GiB
> >> 3.1 TiB 58.04 1.33  43     up osd.34
> >>    38   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 44 MiB 9.8 GiB
> >> 3.5 TiB 51.86 1.19  38     up osd.38
> >>    42   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.7 GiB  11 GiB
> >> 3.5 TiB 51.35 1.18  41     up osd.42
> >>    46   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 3.0 GiB 9.3 GiB
> >> 4.0 TiB 45.60 1.05  35     up osd.46
> >>    50   hdd    7.27599  1.00000  7.3 TiB 3.6 TiB 3.6 TiB 2.5 GiB  11 GiB
> >> 3.7 TiB 49.59 1.14  40     up osd.50
> >>    54   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 54 MiB 9.8 GiB
> >> 3.7 TiB 48.78 1.12  35     up osd.54
> >>    58   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.3 GiB 9.7 GiB
> >> 3.6 TiB 50.55 1.16  39     up osd.58
> >>    62   hdd    7.27599  1.00000  7.3 TiB 3.5 TiB 3.5 TiB 2.4 GiB 9.2 GiB
> >> 3.8 TiB 48.00 1.10  40     up osd.62
> >> -11                0        -      0 B     0 B     0 B     0 B     0
> >> B     0 B     0    0   -                host lpnceph05
> >> -19         87.31200        -   87 TiB  44 TiB  44 TiB 31 GiB 127 GiB
> >> 43 TiB 50.92 1.17   -                host lpnceph07
> >>    89   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 4.2 GiB  11 GiB
> >> 3.9 TiB 46.67 1.07  38     up osd.89
> >>    90   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.4 GiB 9.9 GiB
> >> 3.6 TiB 50.71 1.17  39     up osd.90
> >>    91   hdd    7.27599  1.00000  7.3 TiB 4.4 TiB 4.4 TiB 2.4 GiB  11 GiB
> >> 2.9 TiB 60.10 1.38  47     up osd.91
> >>    92   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 4.1 GiB  11 GiB
> >> 4.0 TiB 45.67 1.05  36     up osd.92
> >>    93   hdd    7.27599  1.00000  7.3 TiB 3.4 TiB 3.4 TiB 2.2 GiB 9.3 GiB
> >> 3.9 TiB 46.63 1.07  38     up osd.93
> >>    94   hdd    7.27599  1.00000  7.3 TiB 3.7 TiB 3.7 TiB 2.3 GiB 9.8 GiB
> >> 3.6 TiB 50.50 1.16  39     up osd.94
> >>    95   hdd    7.27599  1.00000  7.3 TiB 4.1 TiB 4.1 TiB 2.3 GiB  11 GiB
> >> 3.2 TiB 56.33 1.30  44     up osd.95
> >>    97   hdd    7.27599  1.00000  7.3 TiB 3.8 TiB 3.8 TiB 2.4 GiB  10 GiB
> >> 3.5 TiB 52.08 1.20  41     up osd.97
> >>    98   hdd    7.27599  1.00000  7.3 TiB 3.1 TiB 3.1 TiB 3.2 GiB 9.4 GiB
> >> 4.2 TiB 42.94 0.99  34     up osd.98
> >>    99   hdd    7.27599  1.00000  7.3 TiB 3.9 TiB 3.9 TiB 2.9 GiB  12 GiB
> >> 3.3 TiB 54.05 1.24  43     up osd.99
> >> 103   hdd    7.27599  1.00000  7.3 TiB 3.3 TiB 3.3 TiB 2.3 GiB 9.7 GiB
> >> 3.9 TiB 45.87 1.06  36     up osd.103
> >> 104   hdd    7.27599  1.00000  7.3 TiB 4.3 TiB 4.3 TiB 56 MiB  13 GiB
> >> 3.0 TiB 59.43 1.37  44     up osd.104
> >> -23        349.26453        -  349 TiB 124 TiB 123 TiB 65 GiB 321 GiB
> >> 225 TiB 35.45 0.82   -                host lpnceph09
> >>     3   hdd   14.55269  1.00000   15 TiB 5.3 TiB 5.3 TiB 4.0 GiB  14 GiB
> >> 9.2 TiB 36.65 0.84  58     up osd.3
> >>    23   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.3 GiB  12 GiB
> >> 9.5 TiB 34.52 0.79  55     up osd.23
> >>    63   hdd   14.55269  1.00000   15 TiB 5.2 TiB 5.2 TiB 2.4 GiB  13 GiB
> >> 9.3 TiB 35.98 0.83  55     up osd.63
> >>    64   hdd   14.55269  1.00000   15 TiB 5.2 TiB 5.2 TiB 3.6 GiB  15 GiB
> >> 9.3 TiB 35.81 0.82  57     up osd.64
> >>    65   hdd   14.55269  1.00000   15 TiB 5.7 TiB 5.7 TiB 4.6 GiB  16 GiB
> >> 8.8 TiB 39.41 0.91  56     up osd.65
> >>    66   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.4 GiB  13 GiB
> >> 9.6 TiB 34.33 0.79  55     up osd.66
> >>    67   hdd   14.55269  1.00000   15 TiB 5.1 TiB 5.1 TiB 2.4 GiB  13 GiB
> >> 9.4 TiB 35.31 0.81  56     up osd.67
> >>    68   hdd   14.55269  1.00000   15 TiB 5.6 TiB 5.5 TiB 2.3 GiB  14 GiB
> >> 9.0 TiB 38.24 0.88  58     up osd.68
> >>    69   hdd   14.55269  1.00000   15 TiB 5.9 TiB 5.8 TiB 2.3 GiB  15 GiB
> >> 8.7 TiB 40.30 0.93  58     up osd.69
> >>    70   hdd   14.55269  1.00000   15 TiB 4.8 TiB 4.8 TiB 3.0 GiB  13 GiB
> >> 9.7 TiB 33.21 0.76  51     up osd.70
> >>    71   hdd   14.55269  1.00000   15 TiB 5.2 TiB 5.2 TiB 2.2 GiB  13 GiB
> >> 9.4 TiB 35.74 0.82  57     up osd.71
> >>    73   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.3 GiB  12 GiB
> >> 9.6 TiB 34.24 0.79  55     up osd.73
> >>    88   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.3 GiB  12 GiB
> >> 9.5 TiB 34.61 0.80  51     up osd.88
> >>    96   hdd   14.55269  1.00000   15 TiB 5.3 TiB 5.3 TiB 2.3 GiB  13 GiB
> >> 9.3 TiB 36.28 0.83  56     up osd.96
> >> 106   hdd   14.55269  1.00000   15 TiB 4.9 TiB 4.9 TiB 2.5 GiB  13 GiB
> >> 9.6 TiB 33.96 0.78  53     up osd.106
> >> 107   hdd   14.55269  1.00000   15 TiB 5.3 TiB 5.3 TiB 3.2 GiB  15 GiB
> >> 9.3 TiB 36.28 0.83  54     up osd.107
> >> 108   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.3 GiB  13 GiB
> >> 9.5 TiB 34.70 0.80  53     up osd.108
> >> 109   hdd   14.55269  1.00000   15 TiB 5.1 TiB 5.1 TiB 2.4 GiB  12 GiB
> >> 9.5 TiB 34.82 0.80  52     up osd.109
> >> 110   hdd   14.55269  1.00000   15 TiB 5.5 TiB 5.5 TiB 2.8 GiB  16 GiB
> >> 9.0 TiB 37.91 0.87  55     up osd.110
> >> 111   hdd   14.55269  1.00000   15 TiB 5.3 TiB 5.3 TiB 3.2 GiB  14 GiB
> >> 9.3 TiB 36.35 0.84  55     up osd.111
> >> 112   hdd   14.55269  1.00000   15 TiB 5.0 TiB 5.0 TiB 2.9 GiB  14 GiB
> >> 9.6 TiB 34.18 0.79  55     up osd.112
> >> 113   hdd   14.55269  1.00000   15 TiB 4.6 TiB 4.6 TiB 2.3 GiB  12 GiB
> >> 10 TiB 31.47 0.72  48     up osd.113
> >> 114   hdd   14.55269  1.00000   15 TiB 5.0 TiB 4.9 TiB 3.3 GiB  13 GiB
> >> 9.6 TiB 34.07 0.78  53     up osd.114
> >> 115   hdd   14.55269  1.00000   15 TiB 4.7 TiB 4.7 TiB 2.3 GiB  12 GiB
> >> 9.8 TiB 32.47 0.75  51     up osd.115
> >>                           TOTAL 1019 TiB 443 TiB 441 TiB 258 GiB 1.2 TiB
> >> 576 TiB 43.48
> >> MIN/MAX VAR: 0.72/1.42  STDDEV: 6.69
> >>
> >>
> >> and ceph balancer status
> >> {
> >>       "last_optimize_duration": "0:00:02.223977",
> >>       "plans": [],
> >>       "mode": "upmap",
> >>       "active": true,
> >>       "optimize_result": "Unable to find further optimization, or pool(s)
> >> pg_num is decreasing, or distribution is already perfect",
> >>       "last_optimize_started": "Sun Jan 31 17:07:47 2021"
> >> }
> >>
> >> Can the crush rules for placement be blamed for the inequal repartition ?
> >>
> >> F.
> >>
> >> Le 29/01/2021 à 23:44, Dan van der Ster a écrit :
> >>> Thanks, and thanks for the log file OTR which simply showed:
> >>>
> >>>       2021-01-29 23:17:32.567 7f6155cae700  4 mgr[balancer] prepared 0/10 changes
> >>>
> >>> This indeed means that balancer believes those pools are all balanced
> >>> according to the config (which you have set to the defaults).
> >>>
> >>> Could you please also share the output of `ceph osd df tree` so we can
> >>> see the distribution and OSD weights?
> >>>
> >>> You might need simply to decrease the upmap_max_deviation from the
> >>> default of 5. On our clusters we do:
> >>>
> >>>       ceph config set mgr mgr/balancer/upmap_max_deviation 1
> >>>
> >>> Cheers, Dan
> >>>
> >>> On Fri, Jan 29, 2021 at 11:25 PM Francois Legrand <fleg@xxxxxxxxxxxxxx> wrote:
> >>>> Hi Dan,
> >>>>
> >>>> Here is the output of ceph balancer status :
> >>>>
> >>>> /ceph balancer status//
> >>>> //{//
> >>>> //    "last_optimize_duration": "0:00:00.074965", //
> >>>> //    "plans": [], //
> >>>> //    "mode": "upmap", //
> >>>> //    "active": true, //
> >>>> //    "optimize_result": "Unable to find further optimization, or
> >>>> pool(s) pg_num is decreasing, or distribution is already perfect", //
> >>>> //    "last_optimize_started": "Fri Jan 29 23:13:31 2021"//
> >>>> //}/
> >>>>
> >>>>
> >>>> F.
> >>>>
> >>>> Le 29/01/2021 à 10:57, Dan van der Ster a écrit :
> >>>>> Hi Francois,
> >>>>>
> >>>>> What is the output of `ceph balancer status` ?
> >>>>> Also, can you increase the debug_mgr to 4/5 then share the log file of
> >>>>> the active mgr?
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>> On Fri, Jan 29, 2021 at 10:54 AM Francois Legrand <fleg@xxxxxxxxxxxxxx> wrote:
> >>>>>> Thanks for your suggestion. I will have a look !
> >>>>>>
> >>>>>> But I am a bit surprised that the "official" balancer seems so unefficient !
> >>>>>>
> >>>>>> F.
> >>>>>>
> >>>>>> Le 28/01/2021 à 12:00, Jonas Jelten a écrit :
> >>>>>>> Hi!
> >>>>>>>
> >>>>>>> We also suffer heavily from this so I wrote a custom balancer which yields much better results:
> >>>>>>> https://github.com/TheJJ/ceph-balancer
> >>>>>>>
> >>>>>>> After you run it, it echoes the PG movements it suggests. You can then just run those commands the cluster will balance more.
> >>>>>>> It's kinda work in progress, so I'm glad about your feedback.
> >>>>>>>
> >>>>>>> Maybe it helps you :)
> >>>>>>>
> >>>>>>> -- Jonas
> >>>>>>>
> >>>>>>> On 27/01/2021 17.15, Francois Legrand wrote:
> >>>>>>>> Hi all,
> >>>>>>>> I have a cluster with 116 disks (24 new disks of 16TB added in december and the rest of 8TB) running nautilus 14.2.16.
> >>>>>>>> I moved (8 month ago) from crush_compat to upmap balancing.
> >>>>>>>> But the cluster seems not well balanced, with a number of pgs on the 8TB disks varying from 26 to 52 ! And an occupation from 35 to 69%.
> >>>>>>>> The recent 16 TB disks are more homogeneous with 48 to 61 pgs and space between 30 and 43%.
> >>>>>>>> Last week, I realized that some osd were maybe not using upmap because I did a ceph osd crush weight-set ls and got (compat) as result.
> >>>>>>>> Thus I ran a ceph osd crush weight-set rm-compat which triggered some rebalancing. Now there is no more recovery for 2 days, but the cluster is still unbalanced.
> >>>>>>>> As far as I understand, upmap is supposed to reach an equal number of pgs on all the disks (I guess weighted by their capacity).
> >>>>>>>> Thus I would expect more or less 30 pgs on the 8TB disks and 60 on the 16TB and around 50% usage on all. Which is not the case (by far).
> >>>>>>>> The problem is that it impact the free available space in the pools (264Ti while there is more than 578Ti free in the cluster) because free space seems to be based on space available before the first osd will be full !
> >>>>>>>> Is it normal ? Did I missed something ? What could I do ?
> >>>>>>>>
> >>>>>>>> F.
> >>>>>>>> _______________________________________________
> >>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>>> _______________________________________________
> >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux