Re: Does CEPH limit the pgp_num which it will increase in one go?

Dan van der Ster <daniel.vanderster@xxxxxxx> · Tue, 15 Feb 2022 12:05:30 +0100

Hi again,

target_max_misplaced_ratio is a configuration of the mgr balancer.
What's happening here is you are simultaneously splitting and balancing :-)

Cheers, Dan

> On 02/15/2022 11:47 AM Maarten van Ingen <maarten.vaningen@xxxxxxx> wrote:
> 
>  
> Hi,
> 
> I did a small test to see what would happen if it set the amount of "allowed" misplaced object and this indeed does change the amount of PG's will do simultaneously.
> 
> While probably not the balancer itself it, at least, shares this setting:
> 
> ceph config set mgr target_max_misplaced_ratio .015
> 	
> Will result in about 1,5% misplaced objects while it was about 1%:
>     pgs:     20299219/1324542018 objects misplaced (1.533%)
> 
> Which is good to know as you can easily limit the impact on the cluster.
> 
> I think we are good to go for now, thanks for the help on understanding this part of CEPH a little better.
> 
> 
> Met vriendelijke groet,
> Kind Regards,
> Maarten van Ingen
>  
> Specialist |SURF |maarten.vaningen@xxxxxxx <mailto:voornaam.achternaam@xxxxxxx>| T +31 30 88 787 3000 |M +31 6 19 03 90 19| 
> SURF <http://www.surf.nl/> is the collaborative organisation for ICT in Dutch education and research
> 
> Op 15-02-2022 09:56 heeft Maarten van Ingen <maarten.vaningen@xxxxxxx> geschreven:
> 
>     Hi,
> 
>     We have the pg_num set on 4096 for quite some time (months) but only now we increased the pgp_num. So if I understand correctly, the splitting should have been done already months ago. Increasing the pgp_num should only make sure the newly created pg's are actually moved into place.
>     I read this here (for example) http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001610.html
> 
>     We will keep your tip in mind about the time it takes. I think normally 1% is about a days work and 200 pg's in this pool is about 4ish % so that would mean a bit under a week until it's done. We are keeping the osd_max_backfills on 1 for now, setting it higher would ofc mean it should be done much faster. But also it would mean bigger impact on performance on the cluster.
> 
>     Met vriendelijke groet,
>     Kind Regards,
>     Maarten van Ingen
> 
>     Specialist |SURF |maarten.vaningen@xxxxxxx <mailto:voornaam.achternaam@xxxxxxx>| T +31 30 88 787 3000 |M +31 6 19 03 90 19| 
>     SURF <http://www.surf.nl/> is the collaborative organisation for ICT in Dutch education and research
> 
>     Op 15-02-2022 09:30 heeft Dan van der Ster <daniel.vanderster@xxxxxxx> geschreven:
> 
>         Hi,
> 
>         You're confused: the `ceph balancer` is not related to pg splitting. The balancer is used to move PGs around to achieve a uniform distribution.
> 
>         What you're doing now by increasing pg num and pgp_num is splitting --> large PGs in split into smaller ones. This is achieved through backfilling.
> 
>         BTW, while a cluster is continuously backfilling, it will never trim osdmaps. If these accumulate for many days or weeks it can have a service impact on the mons (e.g. disk filling up).
>         For this reason I suggest to let it get to 2248, make sure the osdmaps have trimmed [1], then increase pgp_num again.
> 
>         (This kind of stepwise process is really only important for large clusters where splitting can take many days to finish).
> 
>         Cheers, Dan
> 
>         [1] To see the number of osdmaps, go to any host with osds, e.g. osd.123, and do `ceph daemon osd.123 status`. Then find the difference between newest_map and oldest_map, e.g.:
> 
>             "oldest_map": 3970333,
>             "newest_map": 3971041,
> 
>         It should be under 1000 or so. If much larger then your osdmaps are not trimming.
> 
>         Cheers, Dan
> 
> 
>         > On 02/15/2022 9:08 AM Maarten van Ingen <maarten.vaningen@xxxxxxx> wrote:
>         > 
>         >  
>         > Hi Dan,
>         > 
>         > Thanks for your (very) prompt response.
>         > 
>         > pg_num 4096 pgp_num 2108 pgp_num_target 2248
>         > 
>         > Also I see this:
>         > #ceph balancer eval
>         > current cluster score 0.068634 (lower is better)
>         > 
>         > #ceph balancer status
>         > {
>         >     "last_optimize_duration": "0:00:00.025029", 
>         >     "plans": [], 
>         >     "mode": "upmap", 
>         >     "active": true, 
>         >     "optimize_result": "Too many objects (0.010762 > 0.010000) are misplaced; try again later", 
>         >     "last_optimize_started": "Tue Feb 15 09:05:32 2022"
>         > }
>         > 
>         > Seems it is indeed limiting the data movement by the set 1%
>         > So it is safe to assume I can put the number to 4096 and the total amount of misplaced PG's keeps around 1%. 
>         > 
>         > Met vriendelijke groet,
>         > Kind Regards,
>         > Maarten van Ingen
>         >  
>         > Specialist |SURF |maarten.vaningen@xxxxxxx <mailto:voornaam.achternaam@xxxxxxx>| T +31 30 88 787 3000 |M +31 6 19 03 90 19| 
>         > SURF <http://www.surf.nl/> is the collaborative organisation for ICT in Dutch education and research
>         > 
>         > Op 15-02-2022 09:01 heeft Dan van der Ster <daniel.vanderster@xxxxxxx> geschreven:
>         > 
>         >     Hi Maarten,
>         > 
>         >     With `ceph osd pool ls detail` does it have pgp_num_target set to 2248?
>         >     If so, yes it's moving gradually to that number.
>         > 
>         >     Cheers, Dan
>         > 
>         >     > On 02/15/2022 8:55 AM Maarten van Ingen <maarten.vaningen@xxxxxxx> wrote:
>         >     > 
>         >     >  
>         >     > Hi,
>         >     > 
>         >     > After enabling the balancer (and set to upmap) on our environment it’s time to get the pgp_num on one of the pools on par with the pg_num.
>         >     > This pool has pg_num set to 4096 and pgp_num to 2048 (by our mistake).
>         >     > I just set the pgp_num to 2248 to keep data movement in check.
>         >     > 
>         >     > Oddly enough I see it’s only increased to 2108, also it’s odd we now get this health warning: 1 pools have pg_num > pgp_num, which we haven’t seen before…
>         >     > 
>         >     > 
>         >     > # ceph -s
>         >     >   cluster:
>         >     >     id:     <id>
>         >     >     health: HEALTH_WARN
>         >     >             1 pools have pg_num > pgp_num
>         >     > 
>         >     >   services:
>         >     >     mon: 5 daemons, quorum mon01,mon02,mon03,mon05,mon04 (age 3d)
>         >     >     mgr: mon01(active, since 3w), standbys: mon05, mon04, mon03, mon02
>         >     >     mds: cephfs:1 {0=mon04=up:active} 4 up:standby
>         >     >     osd: 1278 osds: 1278 up (since 68m), 1278 in (since 22h); 74 remapped pgs
>         >     > 
>         >     >   data:
>         >     >     pools:   28 pools, 13824 pgs
>         >     >     objects: 441.41M objects, 1.5 PiB
>         >     >     usage:   4.5 PiB used, 6.9 PiB / 11 PiB avail
>         >     >     pgs:     15652608/1324221126 objects misplaced (1.182%)
>         >     >              13693 active+clean
>         >     >              74    active+remapped+backfilling
>         >     >              56    active+clean+scrubbing+deep
>         >     >              1     active+clean+scrubbing
>         >     > 
>         >     >   io:
>         >     >     client:   187 MiB/s rd, 2.2 GiB/s wr, 11.11k op/s rd, 5.63k op/s wr
>         >     >     recovery: 1.8 GiB/s, 533 objects/s
>         >     > 
>         >     > 
>         >     > ceph osd pool get <pool> pgp_num
>         >     > pgp_num: 2108
>         >     > 
>         >     > Is this default behaviour from ceph?
>         >     > I get the feeling the balancer might have something to do here as well as we have set the balancer to only allow for 1% misplaced objects, to limit this as well. If that’s true, could I just set pgp_num to 4096 directly and CEPH limits the data movement by itself?
>         >     > 
>         >     > We are running a fully updated Nautilus cluster.
>         >     > 
>         >     > Met vriendelijke groet,
>         >     > Kind Regards,
>         >     > Maarten van Ingen
>         >     > 
>         >     > Specialist |SURF |maarten.vaningen@xxxxxxx<mailto:voornaam.achternaam@xxxxxxx>| T +31 30 88 787 3000 |M +31 6 19 03 90 19|
>         >     > SURF<http://www.surf.nl/> is the collaborative organisation for ICT in Dutch education and research
>         >     > _______________________________________________
>         >     > ceph-users mailing list -- ceph-users@xxxxxxx
>         >     > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx