It will take a couple weeks to a couple months to complete is my best guess on 10TB spinners at ~40% full. The cluster should be usable throughout the process. Keep in mind, you should disable the pg autoscaler on any pool which you are manually adjusting the pg_num for. Increasing the pg_num is called "pg splitting" you can google around for this to see how it will work etc. There are a few knobs to increase or decrease the aggressiveness of the pg split, primarily these are osd_max_backfills and target_max_misplaced_ratio. You can monitor the progress of the split by looking at "ceph osd pool ls detail" for the pool you are splitting, for this pool pgp_num will slowly increase up until it reaches the pg_num / pg_num_target. IMO this blog post best covers the issue which you are looking to undertake: https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/ Respectfully, *Wes Dillingham* wes@xxxxxxxxxxxxxxxxx LinkedIn <http://www.linkedin.com/in/wesleydillingham> On Tue, Jan 30, 2024 at 9:38 AM Michel Niyoyita <micou12@xxxxxxxxx> wrote: > Thanks for your advices Wes, below is what ceph osd df tree shows , the > increase of pg_num of the production cluster will not affect the > performance or crush ? how long it can takes to finish? > > ceph osd df tree > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > AVAIL %USE VAR PGS STATUS TYPE NAME > -1 433.11841 - 433 TiB 151 TiB 67 TiB 364 MiB 210 > GiB 282 TiB 34.86 1.00 - root default > -3 144.37280 - 144 TiB 50 TiB 22 TiB 121 MiB 70 > GiB 94 TiB 34.86 1.00 - host ceph-osd1 > 2 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 1021 GiB 5.4 MiB 3.7 > GiB 6.3 TiB 30.40 0.87 19 up osd.2 > 3 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 931 GiB 4.1 MiB 3.5 > GiB 6.4 TiB 29.43 0.84 29 up osd.3 > 6 hdd 9.02330 1.00000 9.0 TiB 3.3 TiB 1.5 TiB 8.1 MiB 4.5 > GiB 5.8 TiB 36.09 1.04 20 up osd.6 > 9 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.0 TiB 6.6 MiB 3.8 > GiB 6.2 TiB 30.97 0.89 23 up osd.9 > 12 hdd 9.02330 1.00000 9.0 TiB 4.0 TiB 2.3 TiB 13 MiB 6.1 > GiB 5.0 TiB 44.68 1.28 30 up osd.12 > 15 hdd 9.02330 1.00000 9.0 TiB 3.5 TiB 1.8 TiB 9.2 MiB 5.2 > GiB 5.5 TiB 38.99 1.12 30 up osd.15 > 18 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.2 TiB 6.5 MiB 4.0 > GiB 6.1 TiB 32.80 0.94 21 up osd.18 > 22 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.9 TiB 10 MiB 5.4 > GiB 5.4 TiB 40.25 1.15 22 up osd.22 > 25 hdd 9.02330 1.00000 9.0 TiB 3.9 TiB 2.1 TiB 12 MiB 5.7 > GiB 5.1 TiB 42.94 1.23 22 up osd.25 > 28 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 7.5 MiB 4.1 > GiB 5.9 TiB 34.87 1.00 21 up osd.28 > 32 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 1017 GiB 4.8 MiB 3.7 > GiB 6.3 TiB 30.36 0.87 27 up osd.32 > 35 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.3 TiB 7.2 MiB 4.2 > GiB 6.0 TiB 33.73 0.97 21 up osd.35 > 38 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 7.3 MiB 4.1 > GiB 5.9 TiB 34.57 0.99 24 up osd.38 > 41 hdd 9.02330 1.00000 9.0 TiB 2.9 TiB 1.2 TiB 6.2 MiB 4.0 > GiB 6.1 TiB 32.49 0.93 24 up osd.41 > 44 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 7.3 MiB 4.4 > GiB 5.9 TiB 34.87 1.00 29 up osd.44 > 47 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 1016 GiB 5.4 MiB 3.6 > GiB 6.3 TiB 30.35 0.87 23 up osd.47 > -7 144.37280 - 144 TiB 50 TiB 22 TiB 122 MiB 70 > GiB 94 TiB 34.86 1.00 - host ceph-osd2 > 1 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 5.7 MiB 3.8 > GiB 6.2 TiB 31.00 0.89 27 up osd.1 > 5 hdd 9.02330 1.00000 9.0 TiB 3.2 TiB 1.5 TiB 7.3 MiB 4.5 > GiB 5.8 TiB 35.45 1.02 27 up osd.5 > 8 hdd 9.02330 1.00000 9.0 TiB 3.3 TiB 1.6 TiB 8.3 MiB 4.7 > GiB 5.7 TiB 36.85 1.06 30 up osd.8 > 10 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 7.5 MiB 4.5 > GiB 5.9 TiB 34.87 1.00 20 up osd.10 > 13 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.8 TiB 10 MiB 5.3 > GiB 5.4 TiB 39.63 1.14 27 up osd.13 > 16 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 6.0 MiB 3.8 > GiB 6.2 TiB 31.01 0.89 19 up osd.16 > 19 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.2 TiB 6.4 MiB 4.0 > GiB 6.1 TiB 32.77 0.94 21 up osd.19 > 21 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 5.5 MiB 3.7 > GiB 6.2 TiB 31.58 0.91 26 up osd.21 > 24 hdd 9.02330 1.00000 9.0 TiB 2.6 TiB 855 GiB 4.7 MiB 3.3 > GiB 6.4 TiB 28.61 0.82 19 up osd.24 > 27 hdd 9.02330 1.00000 9.0 TiB 3.7 TiB 1.9 TiB 10 MiB 5.2 > GiB 5.3 TiB 40.84 1.17 24 up osd.27 > 30 hdd 9.02330 1.00000 9.0 TiB 3.2 TiB 1.4 TiB 7.5 MiB 4.5 > GiB 5.9 TiB 35.16 1.01 22 up osd.30 > 33 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 8.6 MiB 4.3 > GiB 5.9 TiB 34.59 0.99 23 up osd.33 > 36 hdd 9.02330 1.00000 9.0 TiB 3.4 TiB 1.7 TiB 10 MiB 5.0 > GiB 5.6 TiB 38.17 1.09 25 up osd.36 > 39 hdd 9.02330 1.00000 9.0 TiB 3.4 TiB 1.7 TiB 8.5 MiB 5.1 > GiB 5.6 TiB 37.79 1.08 31 up osd.39 > 42 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.8 TiB 10 MiB 5.2 > GiB 5.4 TiB 39.68 1.14 23 up osd.42 > 45 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 964 GiB 5.1 MiB 3.5 > GiB 6.3 TiB 29.78 0.85 21 up osd.45 > -5 144.37280 - 144 TiB 50 TiB 22 TiB 121 MiB 70 > GiB 94 TiB 34.86 1.00 - host ceph-osd3 > 0 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 934 GiB 4.9 MiB 3.4 > GiB 6.4 TiB 29.47 0.85 21 up osd.0 > 4 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.2 TiB 6.5 MiB 4.1 > GiB 6.1 TiB 32.73 0.94 22 up osd.4 > 7 hdd 9.02330 1.00000 9.0 TiB 3.5 TiB 1.8 TiB 9.2 MiB 5.1 > GiB 5.5 TiB 39.02 1.12 30 up osd.7 > 11 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.9 TiB 10 MiB 5.1 > GiB 5.4 TiB 39.97 1.15 27 up osd.11 > 14 hdd 9.02330 1.00000 9.0 TiB 3.5 TiB 1.7 TiB 10 MiB 5.1 > GiB 5.6 TiB 38.24 1.10 27 up osd.14 > 17 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.2 TiB 6.4 MiB 4.1 > GiB 6.0 TiB 33.09 0.95 23 up osd.17 > 20 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 5.6 MiB 3.8 > GiB 6.2 TiB 31.55 0.90 20 up osd.20 > 23 hdd 9.02330 1.00000 9.0 TiB 2.6 TiB 828 GiB 4.0 MiB 3.3 > GiB 6.5 TiB 28.32 0.81 23 up osd.23 > 26 hdd 9.02330 1.00000 9.0 TiB 2.9 TiB 1.2 TiB 5.8 MiB 3.8 > GiB 6.1 TiB 32.12 0.92 26 up osd.26 > 29 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.8 TiB 10 MiB 5.1 > GiB 5.4 TiB 39.73 1.14 24 up osd.29 > 31 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 5.8 MiB 3.7 > GiB 6.2 TiB 31.56 0.91 22 up osd.31 > 34 hdd 9.02330 1.00000 9.0 TiB 3.3 TiB 1.5 TiB 8.2 MiB 4.6 > GiB 5.7 TiB 36.29 1.04 23 up osd.34 > 37 hdd 9.02330 1.00000 9.0 TiB 3.2 TiB 1.5 TiB 8.2 MiB 4.5 > GiB 5.8 TiB 35.51 1.02 20 up osd.37 > 40 hdd 9.02330 1.00000 9.0 TiB 3.4 TiB 1.7 TiB 9.3 MiB 4.9 > GiB 5.6 TiB 38.16 1.09 25 up osd.40 > 43 hdd 9.02330 1.00000 9.0 TiB 3.4 TiB 1.6 TiB 8.5 MiB 4.8 > GiB 5.7 TiB 37.19 1.07 29 up osd.43 > 46 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 8.4 MiB 4.4 > GiB 5.9 TiB 34.85 1.00 23 up osd.46 > TOTAL 433 TiB 151 TiB 67 TiB 364 MiB 210 > GiB 282 TiB 34.86 > MIN/MAX VAR: 0.81/1.28 STDDEV: 3.95 > > > Michel > > > On Tue, Jan 30, 2024 at 4:18 PM Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> > wrote: > >> I now concur you should increase the pg_num as a first step for this >> cluster. Disable the pg autoscaler for and increase the volumes pool to >> pg_num 256. Then likely re-asses and make the next power of 2 jump to 512 >> and probably beyond. >> >> Keep in mind this is not going to fix your short term deep-scrub issue in >> fact it will increase the number of not scrubbed in time PGs until the >> pg_num change is complete. This is because OSDs dont scrub when they are >> backfilling. >> >> I would sit on 256 for a couple weeks and let scrubs happen then continue >> past 256. >> >> with the ultimate target of around 100-200 PGs per OSD which "ceph osd df >> tree" will show you in the PGs column. >> >> Respectfully, >> >> *Wes Dillingham* >> wes@xxxxxxxxxxxxxxxxx >> LinkedIn <http://www.linkedin.com/in/wesleydillingham> >> >> >> On Tue, Jan 30, 2024 at 3:16 AM Michel Niyoyita <micou12@xxxxxxxxx> >> wrote: >> >>> Dear team, >>> >>> below is the output of ceph df command and the ceph version I am running >>> >>> ceph df >>> --- RAW STORAGE --- >>> CLASS SIZE AVAIL USED RAW USED %RAW USED >>> hdd 433 TiB 282 TiB 151 TiB 151 TiB 34.82 >>> TOTAL 433 TiB 282 TiB 151 TiB 151 TiB 34.82 >>> >>> --- POOLS --- >>> POOL ID PGS STORED OBJECTS USED %USED MAX >>> AVAIL >>> device_health_metrics 1 1 1.1 MiB 3 3.2 MiB 0 73 >>> TiB >>> .rgw.root 2 32 3.7 KiB 8 96 KiB 0 73 >>> TiB >>> default.rgw.log 3 32 3.6 KiB 209 408 KiB 0 73 >>> TiB >>> default.rgw.control 4 32 0 B 8 0 B 0 73 >>> TiB >>> default.rgw.meta 5 32 382 B 2 24 KiB 0 73 >>> TiB >>> volumes 6 128 21 TiB 5.68M 62 TiB 22.09 73 >>> TiB >>> images 7 32 878 GiB 112.50k 2.6 TiB 1.17 73 >>> TiB >>> backups 8 32 0 B 0 0 B 0 73 >>> TiB >>> vms 9 32 881 GiB 174.30k 2.5 TiB 1.13 73 >>> TiB >>> testbench 10 32 0 B 0 0 B 0 73 >>> TiB >>> root@ceph-mon1:~# ceph --version >>> ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific >>> (stable) >>> root@ceph-mon1:~# >>> >>> please advise accordingly >>> >>> Michel >>> >>> On Mon, Jan 29, 2024 at 9:48 PM Frank Schilder <frans@xxxxxx> wrote: >>> >>> > You will have to look at the output of "ceph df" and make a decision to >>> > balance "objects per PG" and "GB per PG". Increase he PG count for the >>> > pools with the worst of these two numbers most such that it balances >>> out as >>> > much as possible. If you have pools that see significantly more user-IO >>> > than others, prioritise these. >>> > >>> > You will have to find out for your specific cluster, we can only give >>> > general guidelines. Make changes, run benchmarks, re-evaluate. Take the >>> > time for it. The better you know your cluster and your users, the >>> better >>> > the end result will be. >>> > >>> > Best regards, >>> > ================= >>> > Frank Schilder >>> > AIT Risø Campus >>> > Bygning 109, rum S14 >>> > >>> > ________________________________________ >>> > From: Michel Niyoyita <micou12@xxxxxxxxx> >>> > Sent: Monday, January 29, 2024 2:04 PM >>> > To: Janne Johansson >>> > Cc: Frank Schilder; E Taka; ceph-users >>> > Subject: Re: Re: 6 pgs not deep-scrubbed in time >>> > >>> > This is how it is set , if you suggest to make some changes please >>> advises. >>> > >>> > Thank you. >>> > >>> > >>> > ceph osd pool ls detail >>> > pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule >>> 0 >>> > object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change >>> 1407 >>> > flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application >>> > mgr_devicehealth >>> > pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 >>> object_hash >>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1393 flags >>> > hashpspool stripe_width 0 application rgw >>> > pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 >>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >>> > 1394 flags hashpspool stripe_width 0 application rgw >>> > pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 >>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >>> > 1395 flags hashpspool stripe_width 0 application rgw >>> > pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 >>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >>> > 1396 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application >>> rgw >>> > pool 6 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash >>> > rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 108802 >>> lfor >>> > 0/0/14812 flags hashpspool,selfmanaged_snaps stripe_width 0 >>> application rbd >>> > removed_snaps_queue >>> > >>> [22d7~3,11561~2,11571~1,11573~1c,11594~6,1159b~f,115b0~1,115b3~1,115c3~1,115f3~1,115f5~e,11613~6,1161f~c,11637~1b,11660~1,11663~2,11673~1,116d1~c,116f5~10,11721~c] >>> > pool 7 'images' replicated size 3 min_size 2 crush_rule 0 object_hash >>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 94609 flags >>> > hashpspool,selfmanaged_snaps stripe_width 0 application rbd >>> > pool 8 'backups' replicated size 3 min_size 2 crush_rule 0 object_hash >>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1399 flags >>> > hashpspool stripe_width 0 application rbd >>> > pool 9 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash >>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 108783 lfor >>> > 0/561/559 flags hashpspool,selfmanaged_snaps stripe_width 0 >>> application rbd >>> > removed_snaps_queue [3fa~1,3fc~3,400~1,402~1] >>> > pool 10 'testbench' replicated size 3 min_size 2 crush_rule 0 >>> object_hash >>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 20931 lfor >>> > 0/20931/20929 flags hashpspool stripe_width 0 >>> > >>> > >>> > On Mon, Jan 29, 2024 at 2:09 PM Michel Niyoyita <micou12@xxxxxxxxx >>> <mailto: >>> > micou12@xxxxxxxxx>> wrote: >>> > Thank you Janne , >>> > >>> > no need of setting some flags like ceph osd set nodeep-scrub ??? >>> > >>> > Thank you >>> > >>> > On Mon, Jan 29, 2024 at 2:04 PM Janne Johansson <icepic.dz@xxxxxxxxx >>> > <mailto:icepic.dz@xxxxxxxxx>> wrote: >>> > Den mån 29 jan. 2024 kl 12:58 skrev Michel Niyoyita <micou12@xxxxxxxxx >>> > <mailto:micou12@xxxxxxxxx>>: >>> > > >>> > > Thank you Frank , >>> > > >>> > > All disks are HDDs . Would like to know if I can increase the number >>> of >>> > PGs >>> > > live in production without a negative impact to the cluster. if yes >>> which >>> > > commands to use . >>> > >>> > Yes. "ceph osd pool set <poolname> pg_num <number larger than before>" >>> > where the number usually should be a power of two that leads to a >>> > number of PGs per OSD between 100-200. >>> > >>> > -- >>> > May the most significant bit of your life be positive. >>> > >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx