Dilemma with PG distribution

Boris Behrens <bb@xxxxxxxxx> · Sun, 4 Dec 2022 15:10:17 +0100

Hi,
I am just evaluating out cluster configuration again, because we had an
very bad incident with laggy OSDs that shut down the entire cluster.

We use datacenter SSDs in different sizes (2, 4, 8TB) and someone said,
that I should not go beyond a specific amount of PGs on certain device
classes.

Now the distribution in our cluster is 100PGs on the 2TB disks, 200PGs on
the 4TB disks and 400PGs on the 8TB disks.

According to this (
https://docs.ceph.com/en/latest/rados/operations/placement-groups/#choosing-the-number-of-placement-groups)
I should got with 150OSDs * 100 / 3 replication and round UP to the nearest
power of two which would be 8k

BUT because these are just SSDs (100k/30k read/write IOPS) I think I
should  got down to 4k PGs, to not overload the 8TB disks with operations.

What do you think?

These are out cluster stats:
# ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
ssd    419 TiB  122 TiB  297 TiB   298 TiB      71.01
TOTAL  419 TiB  122 TiB  297 TiB   298 TiB      71.01

--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
isos                    7    64  431 GiB  111.58k  1.3 TiB   1.57     26 TiB
rbd                     8  8192   98 TiB   34.17M  288 TiB  78.46     26 TiB
archive                 9   128  2.6 TiB  722.10k  7.8 TiB   8.99     26 TiB
device_health_metrics  10     1  203 MiB      156  608 MiB      0     26 TiB

# ceph status
  cluster:
    id:     74313356-3b3d-43f3-bce6-9fb0e4591097
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
44h)
    mgr: ceph-rbd-mon5(active, since 44h), standbys: ceph-rbd-mon6,
ceph-rbd-mon4
    osd: 150 osds: 150 up (since 44h), 150 in (since 6d)

  data:
    pools:   4 pools, 8385 pgs
    objects: 35.01M objects, 110 TiB
    usage:   298 TiB used, 122 TiB / 419 TiB avail
    pgs:     8380 active+clean
             3    active+clean+scrubbing+deep
             2    active+clean+scrubbing

  io:
    client:   100 MiB/s rd, 433 MiB/s wr, 1.45k op/s rd, 12.02k op/s wr

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx