Re: New to ceph / Very unbalanced cluster

Reed Dier <reed.dier@xxxxxxxxxxx> · Wed, 15 Apr 2020 15:59:12 -0500

Copying the ML, because I forgot to reply-all.
Reed

On Apr 15, 2020, at 3:58 PM, Reed Dier <reed.dier@xxxxxxxxxxx> wrote:

The problem is almost certainly stemming from unbalanced OSD distribution among your hosts, and assuming you are using a default 3x replication across hosts crush rule set.

You are limited by your smallest bin size.

In this case you have a 750GB HDD as the only OSD on node1, so when it wants 3 copies across 3 hosts, there are only ~750GB of space that can fulfill this requirement.

Having lots of different size OSDs and differing OSDs in your topology is going to lead to issues of under/over utilization.

ID  CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
-1          21.54213    root default
-3          0.75679     host node1
-5          5.39328     host node2
-10        15.39206     host node3

You either need to redistribute your OSDs across your hosts, or possibly rethink your disk strategy.
You could move osd.5 to node1, and osd.0 to node2, which would give you roughly 6TiB of usable hdd space across your three nodes.

Reed

On Apr 15, 2020, at 10:50 AM, Simon Sutter <ssutter@xxxxxxxxxxx> wrote:

Hello everybody,

I'm very new to ceph and installed a testenvironment (nautilus).

The current goal of this cluster is, to be a short period backup.

For this goal we want to use older, mixed hardware, so I was thinking, for testing I will set up very unbalanced nodes (you can learn the most, from exceptional circumstances, right?).

I created for my cephfs two pools, one for metadata and one for storage data.

I have three nodes and the ceph osd tree looks like this:
ID  CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
 -1          21.54213    root default
 -3            0.75679     host node1
  0   hdd  0.75679         osd.0      up  0.00636 1.00000
 -5           5.39328     host node2
  1   hdd  2.66429         osd.1      up  0.65007 1.00000
  3   hdd  2.72899         osd.3      up  0.65007 1.00000
-10        15.39206     host node3
  5   hdd  7.27739         osd.5      up  1.00000 1.00000
  6   hdd  7.27739         osd.6      up  1.00000 1.00000
  2   ssd  0.38249         osd.2      up  1.00000 1.00000
  4   ssd  0.45479         osd.4      up  1.00000 1.00000

The PGs and thus the data is extremely unbalanced, you can see it in the ceph osd df overview:
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS
 0   hdd 0.75679  0.00636 775 GiB 651 GiB 650 GiB  88 KiB     1.5 GiB 124 GiB   84.02   7.26 112     up
 1   hdd 2.66429  0.65007   2.7 TiB 497 GiB 496 GiB  88 KiB     1.2 GiB   2.2 TiB   18.22   1.57  81     up
 3   hdd 2.72899  0.65007   2.7 TiB 505 GiB 504 GiB    8 KiB     1.3 GiB   2.2 TiB   18.07   1.56  88     up
 5   hdd 7.27739  1.00000   7.3 TiB 390 GiB 389 GiB    8 KiB     1.2 GiB   6.9 TiB     5.24   0.45  67     up
 6   hdd 7.27739  1.00000   7.3 TiB 467 GiB 465 GiB  64 KiB     1.3 GiB   6.8 TiB     6.26   0.54  78     up
 2   ssd 0.38249   1.00000  392 GiB   14 GiB   13 GiB  11 KiB 1024 MiB 377 GiB     3.68   0.32   2     up
 4   ssd 0.45479   1.00000  466 GiB   28 GiB   27 GiB    4 KiB 1024 MiB 438 GiB     6.03    0.52   4     up
                    TOTAL  22 TiB 2.5 TiB 2.5 TiB 273 KiB  8.4 GiB  19 TiB 11.57
MIN/MAX VAR: 0.32/7.26  STDDEV: 6.87

To counteract this, I tried to turn on the balancer module.

The module is decreasing the reweight of the osd0 more and more, while ceph pg stat is telling me, there are more misplaced objects:

144 pgs: 144 active+clean+remapped; 853 GiB data, 2.5 TiB used, 19 TiB / 22 TiB avail; 30 MiB/s wr, 7 op/s; 242259/655140 objects misplaced (36.978%)

So my question is: is ceph supposed to do that?
Why are all those objects misplaced? Because of those 112 PGs on osd0?
Why are there 112 PGs on osd0? I did not set any pg settings except the number: 512

Thank you very much
Simon Sutter
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx