Hi, You're confused: the `ceph balancer` is not related to pg splitting. The balancer is used to move PGs around to achieve a uniform distribution. What you're doing now by increasing pg num and pgp_num is splitting --> large PGs in split into smaller ones. This is achieved through backfilling. BTW, while a cluster is continuously backfilling, it will never trim osdmaps. If these accumulate for many days or weeks it can have a service impact on the mons (e.g. disk filling up). For this reason I suggest to let it get to 2248, make sure the osdmaps have trimmed [1], then increase pgp_num again. (This kind of stepwise process is really only important for large clusters where splitting can take many days to finish). Cheers, Dan [1] To see the number of osdmaps, go to any host with osds, e.g. osd.123, and do `ceph daemon osd.123 status`. Then find the difference between newest_map and oldest_map, e.g.: "oldest_map": 3970333, "newest_map": 3971041, It should be under 1000 or so. If much larger then your osdmaps are not trimming. Cheers, Dan > On 02/15/2022 9:08 AM Maarten van Ingen <maarten.vaningen@xxxxxxx> wrote: > > > Hi Dan, > > Thanks for your (very) prompt response. > > pg_num 4096 pgp_num 2108 pgp_num_target 2248 > > Also I see this: > #ceph balancer eval > current cluster score 0.068634 (lower is better) > > #ceph balancer status > { > "last_optimize_duration": "0:00:00.025029", > "plans": [], > "mode": "upmap", > "active": true, > "optimize_result": "Too many objects (0.010762 > 0.010000) are misplaced; try again later", > "last_optimize_started": "Tue Feb 15 09:05:32 2022" > } > > Seems it is indeed limiting the data movement by the set 1% > So it is safe to assume I can put the number to 4096 and the total amount of misplaced PG's keeps around 1%. > > Met vriendelijke groet, > Kind Regards, > Maarten van Ingen > > Specialist |SURF |maarten.vaningen@xxxxxxx <mailto:voornaam.achternaam@xxxxxxx>| T +31 30 88 787 3000 |M +31 6 19 03 90 19| > SURF <http://www.surf.nl/> is the collaborative organisation for ICT in Dutch education and research > > Op 15-02-2022 09:01 heeft Dan van der Ster <daniel.vanderster@xxxxxxx> geschreven: > > Hi Maarten, > > With `ceph osd pool ls detail` does it have pgp_num_target set to 2248? > If so, yes it's moving gradually to that number. > > Cheers, Dan > > > On 02/15/2022 8:55 AM Maarten van Ingen <maarten.vaningen@xxxxxxx> wrote: > > > > > > Hi, > > > > After enabling the balancer (and set to upmap) on our environment it’s time to get the pgp_num on one of the pools on par with the pg_num. > > This pool has pg_num set to 4096 and pgp_num to 2048 (by our mistake). > > I just set the pgp_num to 2248 to keep data movement in check. > > > > Oddly enough I see it’s only increased to 2108, also it’s odd we now get this health warning: 1 pools have pg_num > pgp_num, which we haven’t seen before… > > > > > > # ceph -s > > cluster: > > id: <id> > > health: HEALTH_WARN > > 1 pools have pg_num > pgp_num > > > > services: > > mon: 5 daemons, quorum mon01,mon02,mon03,mon05,mon04 (age 3d) > > mgr: mon01(active, since 3w), standbys: mon05, mon04, mon03, mon02 > > mds: cephfs:1 {0=mon04=up:active} 4 up:standby > > osd: 1278 osds: 1278 up (since 68m), 1278 in (since 22h); 74 remapped pgs > > > > data: > > pools: 28 pools, 13824 pgs > > objects: 441.41M objects, 1.5 PiB > > usage: 4.5 PiB used, 6.9 PiB / 11 PiB avail > > pgs: 15652608/1324221126 objects misplaced (1.182%) > > 13693 active+clean > > 74 active+remapped+backfilling > > 56 active+clean+scrubbing+deep > > 1 active+clean+scrubbing > > > > io: > > client: 187 MiB/s rd, 2.2 GiB/s wr, 11.11k op/s rd, 5.63k op/s wr > > recovery: 1.8 GiB/s, 533 objects/s > > > > > > ceph osd pool get <pool> pgp_num > > pgp_num: 2108 > > > > Is this default behaviour from ceph? > > I get the feeling the balancer might have something to do here as well as we have set the balancer to only allow for 1% misplaced objects, to limit this as well. If that’s true, could I just set pgp_num to 4096 directly and CEPH limits the data movement by itself? > > > > We are running a fully updated Nautilus cluster. > > > > Met vriendelijke groet, > > Kind Regards, > > Maarten van Ingen > > > > Specialist |SURF |maarten.vaningen@xxxxxxx<mailto:voornaam.achternaam@xxxxxxx>| T +31 30 88 787 3000 |M +31 6 19 03 90 19| > > SURF<http://www.surf.nl/> is the collaborative organisation for ICT in Dutch education and research > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx