Re: Questions about PG auto-scaling and node addition

Kai Stian Olstad <ceph+list@xxxxxxxxxx> · Thu, 14 Sep 2023 21:44:57 +0200

On Wed, Sep 13, 2023 at 04:33:32PM +0200, Christophe BAILLON wrote:
We have a cluster with 21 nodes, each having 12 x 18TB, and 2 NVMe for db/wal.
We need to add more nodes.
The last time we did this, the PGs remained at 1024, so the number of PGs per OSD decreased.
Currently, we are at 43 PGs per OSD.

Does auto-scaling work correctly in Ceph version 17.2.5?

I would believe so, it's working as designed, default the auto-scaler increasing
number PGs based on how much data is stored.
So when you add OSDs, data usage is the same and therefor no scaling is done.

Should we increase the number of PGs before adding nodes?

Adding nodes/OSDs and changing number of PGs involves a lot of data being
copied around.
So if those two could be combined you only need to copied the data once instead
of twice.
But if that is smart or possible I'm not sure of.

Should we keep PG auto-scaling active?

If we disable auto-scaling, should we increase the number of PGs to reach 100 PGs per OSD?

If you know how much of the data is going to be stored in a pool the best way
is to set the number of PG up front.
Because every time the auto-scaler changed the number of PGs you will have a
huge amount of data being copied around to other OSDs.

You can set the target size or target ratio[1] and the auto-scaler with set the
appropriate number of PGs on the pool.

But if you know how much data is going to be stored in a pool you can turn it
of and just set it manually.

100 is a rule of thumb, but with so large disk you could or maybe should
consider having a higher number of PGs per OSD.

[1] https://docs.ceph.com/en/quincy/rados/operations/placement-groups/#viewing-pg-scaling-recommendations

--
Kai Stian Olstad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx