Thanks for your advices Wes, below is what ceph osd df tree shows , the increase of pg_num of the production cluster will not affect the performance or crush ? how long it can takes to finish? ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 433.11841 - 433 TiB 151 TiB 67 TiB 364 MiB 210 GiB 282 TiB 34.86 1.00 - root default -3 144.37280 - 144 TiB 50 TiB 22 TiB 121 MiB 70 GiB 94 TiB 34.86 1.00 - host ceph-osd1 2 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 1021 GiB 5.4 MiB 3.7 GiB 6.3 TiB 30.40 0.87 19 up osd.2 3 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 931 GiB 4.1 MiB 3.5 GiB 6.4 TiB 29.43 0.84 29 up osd.3 6 hdd 9.02330 1.00000 9.0 TiB 3.3 TiB 1.5 TiB 8.1 MiB 4.5 GiB 5.8 TiB 36.09 1.04 20 up osd.6 9 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.0 TiB 6.6 MiB 3.8 GiB 6.2 TiB 30.97 0.89 23 up osd.9 12 hdd 9.02330 1.00000 9.0 TiB 4.0 TiB 2.3 TiB 13 MiB 6.1 GiB 5.0 TiB 44.68 1.28 30 up osd.12 15 hdd 9.02330 1.00000 9.0 TiB 3.5 TiB 1.8 TiB 9.2 MiB 5.2 GiB 5.5 TiB 38.99 1.12 30 up osd.15 18 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.2 TiB 6.5 MiB 4.0 GiB 6.1 TiB 32.80 0.94 21 up osd.18 22 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.9 TiB 10 MiB 5.4 GiB 5.4 TiB 40.25 1.15 22 up osd.22 25 hdd 9.02330 1.00000 9.0 TiB 3.9 TiB 2.1 TiB 12 MiB 5.7 GiB 5.1 TiB 42.94 1.23 22 up osd.25 28 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 7.5 MiB 4.1 GiB 5.9 TiB 34.87 1.00 21 up osd.28 32 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 1017 GiB 4.8 MiB 3.7 GiB 6.3 TiB 30.36 0.87 27 up osd.32 35 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.3 TiB 7.2 MiB 4.2 GiB 6.0 TiB 33.73 0.97 21 up osd.35 38 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 7.3 MiB 4.1 GiB 5.9 TiB 34.57 0.99 24 up osd.38 41 hdd 9.02330 1.00000 9.0 TiB 2.9 TiB 1.2 TiB 6.2 MiB 4.0 GiB 6.1 TiB 32.49 0.93 24 up osd.41 44 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 7.3 MiB 4.4 GiB 5.9 TiB 34.87 1.00 29 up osd.44 47 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 1016 GiB 5.4 MiB 3.6 GiB 6.3 TiB 30.35 0.87 23 up osd.47 -7 144.37280 - 144 TiB 50 TiB 22 TiB 122 MiB 70 GiB 94 TiB 34.86 1.00 - host ceph-osd2 1 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 5.7 MiB 3.8 GiB 6.2 TiB 31.00 0.89 27 up osd.1 5 hdd 9.02330 1.00000 9.0 TiB 3.2 TiB 1.5 TiB 7.3 MiB 4.5 GiB 5.8 TiB 35.45 1.02 27 up osd.5 8 hdd 9.02330 1.00000 9.0 TiB 3.3 TiB 1.6 TiB 8.3 MiB 4.7 GiB 5.7 TiB 36.85 1.06 30 up osd.8 10 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 7.5 MiB 4.5 GiB 5.9 TiB 34.87 1.00 20 up osd.10 13 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.8 TiB 10 MiB 5.3 GiB 5.4 TiB 39.63 1.14 27 up osd.13 16 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 6.0 MiB 3.8 GiB 6.2 TiB 31.01 0.89 19 up osd.16 19 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.2 TiB 6.4 MiB 4.0 GiB 6.1 TiB 32.77 0.94 21 up osd.19 21 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 5.5 MiB 3.7 GiB 6.2 TiB 31.58 0.91 26 up osd.21 24 hdd 9.02330 1.00000 9.0 TiB 2.6 TiB 855 GiB 4.7 MiB 3.3 GiB 6.4 TiB 28.61 0.82 19 up osd.24 27 hdd 9.02330 1.00000 9.0 TiB 3.7 TiB 1.9 TiB 10 MiB 5.2 GiB 5.3 TiB 40.84 1.17 24 up osd.27 30 hdd 9.02330 1.00000 9.0 TiB 3.2 TiB 1.4 TiB 7.5 MiB 4.5 GiB 5.9 TiB 35.16 1.01 22 up osd.30 33 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 8.6 MiB 4.3 GiB 5.9 TiB 34.59 0.99 23 up osd.33 36 hdd 9.02330 1.00000 9.0 TiB 3.4 TiB 1.7 TiB 10 MiB 5.0 GiB 5.6 TiB 38.17 1.09 25 up osd.36 39 hdd 9.02330 1.00000 9.0 TiB 3.4 TiB 1.7 TiB 8.5 MiB 5.1 GiB 5.6 TiB 37.79 1.08 31 up osd.39 42 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.8 TiB 10 MiB 5.2 GiB 5.4 TiB 39.68 1.14 23 up osd.42 45 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 964 GiB 5.1 MiB 3.5 GiB 6.3 TiB 29.78 0.85 21 up osd.45 -5 144.37280 - 144 TiB 50 TiB 22 TiB 121 MiB 70 GiB 94 TiB 34.86 1.00 - host ceph-osd3 0 hdd 9.02330 1.00000 9.0 TiB 2.7 TiB 934 GiB 4.9 MiB 3.4 GiB 6.4 TiB 29.47 0.85 21 up osd.0 4 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.2 TiB 6.5 MiB 4.1 GiB 6.1 TiB 32.73 0.94 22 up osd.4 7 hdd 9.02330 1.00000 9.0 TiB 3.5 TiB 1.8 TiB 9.2 MiB 5.1 GiB 5.5 TiB 39.02 1.12 30 up osd.7 11 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.9 TiB 10 MiB 5.1 GiB 5.4 TiB 39.97 1.15 27 up osd.11 14 hdd 9.02330 1.00000 9.0 TiB 3.5 TiB 1.7 TiB 10 MiB 5.1 GiB 5.6 TiB 38.24 1.10 27 up osd.14 17 hdd 9.02330 1.00000 9.0 TiB 3.0 TiB 1.2 TiB 6.4 MiB 4.1 GiB 6.0 TiB 33.09 0.95 23 up osd.17 20 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 5.6 MiB 3.8 GiB 6.2 TiB 31.55 0.90 20 up osd.20 23 hdd 9.02330 1.00000 9.0 TiB 2.6 TiB 828 GiB 4.0 MiB 3.3 GiB 6.5 TiB 28.32 0.81 23 up osd.23 26 hdd 9.02330 1.00000 9.0 TiB 2.9 TiB 1.2 TiB 5.8 MiB 3.8 GiB 6.1 TiB 32.12 0.92 26 up osd.26 29 hdd 9.02330 1.00000 9.0 TiB 3.6 TiB 1.8 TiB 10 MiB 5.1 GiB 5.4 TiB 39.73 1.14 24 up osd.29 31 hdd 9.02330 1.00000 9.0 TiB 2.8 TiB 1.1 TiB 5.8 MiB 3.7 GiB 6.2 TiB 31.56 0.91 22 up osd.31 34 hdd 9.02330 1.00000 9.0 TiB 3.3 TiB 1.5 TiB 8.2 MiB 4.6 GiB 5.7 TiB 36.29 1.04 23 up osd.34 37 hdd 9.02330 1.00000 9.0 TiB 3.2 TiB 1.5 TiB 8.2 MiB 4.5 GiB 5.8 TiB 35.51 1.02 20 up osd.37 40 hdd 9.02330 1.00000 9.0 TiB 3.4 TiB 1.7 TiB 9.3 MiB 4.9 GiB 5.6 TiB 38.16 1.09 25 up osd.40 43 hdd 9.02330 1.00000 9.0 TiB 3.4 TiB 1.6 TiB 8.5 MiB 4.8 GiB 5.7 TiB 37.19 1.07 29 up osd.43 46 hdd 9.02330 1.00000 9.0 TiB 3.1 TiB 1.4 TiB 8.4 MiB 4.4 GiB 5.9 TiB 34.85 1.00 23 up osd.46 TOTAL 433 TiB 151 TiB 67 TiB 364 MiB 210 GiB 282 TiB 34.86 MIN/MAX VAR: 0.81/1.28 STDDEV: 3.95 Michel On Tue, Jan 30, 2024 at 4:18 PM Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> wrote: > I now concur you should increase the pg_num as a first step for this > cluster. Disable the pg autoscaler for and increase the volumes pool to > pg_num 256. Then likely re-asses and make the next power of 2 jump to 512 > and probably beyond. > > Keep in mind this is not going to fix your short term deep-scrub issue in > fact it will increase the number of not scrubbed in time PGs until the > pg_num change is complete. This is because OSDs dont scrub when they are > backfilling. > > I would sit on 256 for a couple weeks and let scrubs happen then continue > past 256. > > with the ultimate target of around 100-200 PGs per OSD which "ceph osd df > tree" will show you in the PGs column. > > Respectfully, > > *Wes Dillingham* > wes@xxxxxxxxxxxxxxxxx > LinkedIn <http://www.linkedin.com/in/wesleydillingham> > > > On Tue, Jan 30, 2024 at 3:16 AM Michel Niyoyita <micou12@xxxxxxxxx> wrote: > >> Dear team, >> >> below is the output of ceph df command and the ceph version I am running >> >> ceph df >> --- RAW STORAGE --- >> CLASS SIZE AVAIL USED RAW USED %RAW USED >> hdd 433 TiB 282 TiB 151 TiB 151 TiB 34.82 >> TOTAL 433 TiB 282 TiB 151 TiB 151 TiB 34.82 >> >> --- POOLS --- >> POOL ID PGS STORED OBJECTS USED %USED MAX >> AVAIL >> device_health_metrics 1 1 1.1 MiB 3 3.2 MiB 0 73 >> TiB >> .rgw.root 2 32 3.7 KiB 8 96 KiB 0 73 >> TiB >> default.rgw.log 3 32 3.6 KiB 209 408 KiB 0 73 >> TiB >> default.rgw.control 4 32 0 B 8 0 B 0 73 >> TiB >> default.rgw.meta 5 32 382 B 2 24 KiB 0 73 >> TiB >> volumes 6 128 21 TiB 5.68M 62 TiB 22.09 73 >> TiB >> images 7 32 878 GiB 112.50k 2.6 TiB 1.17 73 >> TiB >> backups 8 32 0 B 0 0 B 0 73 >> TiB >> vms 9 32 881 GiB 174.30k 2.5 TiB 1.13 73 >> TiB >> testbench 10 32 0 B 0 0 B 0 73 >> TiB >> root@ceph-mon1:~# ceph --version >> ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific >> (stable) >> root@ceph-mon1:~# >> >> please advise accordingly >> >> Michel >> >> On Mon, Jan 29, 2024 at 9:48 PM Frank Schilder <frans@xxxxxx> wrote: >> >> > You will have to look at the output of "ceph df" and make a decision to >> > balance "objects per PG" and "GB per PG". Increase he PG count for the >> > pools with the worst of these two numbers most such that it balances >> out as >> > much as possible. If you have pools that see significantly more user-IO >> > than others, prioritise these. >> > >> > You will have to find out for your specific cluster, we can only give >> > general guidelines. Make changes, run benchmarks, re-evaluate. Take the >> > time for it. The better you know your cluster and your users, the better >> > the end result will be. >> > >> > Best regards, >> > ================= >> > Frank Schilder >> > AIT Risø Campus >> > Bygning 109, rum S14 >> > >> > ________________________________________ >> > From: Michel Niyoyita <micou12@xxxxxxxxx> >> > Sent: Monday, January 29, 2024 2:04 PM >> > To: Janne Johansson >> > Cc: Frank Schilder; E Taka; ceph-users >> > Subject: Re: Re: 6 pgs not deep-scrubbed in time >> > >> > This is how it is set , if you suggest to make some changes please >> advises. >> > >> > Thank you. >> > >> > >> > ceph osd pool ls detail >> > pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 >> > object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change >> 1407 >> > flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application >> > mgr_devicehealth >> > pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash >> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1393 flags >> > hashpspool stripe_width 0 application rgw >> > pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 >> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >> > 1394 flags hashpspool stripe_width 0 application rgw >> > pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 >> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >> > 1395 flags hashpspool stripe_width 0 application rgw >> > pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 >> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >> > 1396 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw >> > pool 6 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash >> > rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 108802 >> lfor >> > 0/0/14812 flags hashpspool,selfmanaged_snaps stripe_width 0 application >> rbd >> > removed_snaps_queue >> > >> [22d7~3,11561~2,11571~1,11573~1c,11594~6,1159b~f,115b0~1,115b3~1,115c3~1,115f3~1,115f5~e,11613~6,1161f~c,11637~1b,11660~1,11663~2,11673~1,116d1~c,116f5~10,11721~c] >> > pool 7 'images' replicated size 3 min_size 2 crush_rule 0 object_hash >> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 94609 flags >> > hashpspool,selfmanaged_snaps stripe_width 0 application rbd >> > pool 8 'backups' replicated size 3 min_size 2 crush_rule 0 object_hash >> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1399 flags >> > hashpspool stripe_width 0 application rbd >> > pool 9 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash >> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 108783 lfor >> > 0/561/559 flags hashpspool,selfmanaged_snaps stripe_width 0 application >> rbd >> > removed_snaps_queue [3fa~1,3fc~3,400~1,402~1] >> > pool 10 'testbench' replicated size 3 min_size 2 crush_rule 0 >> object_hash >> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 20931 lfor >> > 0/20931/20929 flags hashpspool stripe_width 0 >> > >> > >> > On Mon, Jan 29, 2024 at 2:09 PM Michel Niyoyita <micou12@xxxxxxxxx >> <mailto: >> > micou12@xxxxxxxxx>> wrote: >> > Thank you Janne , >> > >> > no need of setting some flags like ceph osd set nodeep-scrub ??? >> > >> > Thank you >> > >> > On Mon, Jan 29, 2024 at 2:04 PM Janne Johansson <icepic.dz@xxxxxxxxx >> > <mailto:icepic.dz@xxxxxxxxx>> wrote: >> > Den mån 29 jan. 2024 kl 12:58 skrev Michel Niyoyita <micou12@xxxxxxxxx >> > <mailto:micou12@xxxxxxxxx>>: >> > > >> > > Thank you Frank , >> > > >> > > All disks are HDDs . Would like to know if I can increase the number >> of >> > PGs >> > > live in production without a negative impact to the cluster. if yes >> which >> > > commands to use . >> > >> > Yes. "ceph osd pool set <poolname> pg_num <number larger than before>" >> > where the number usually should be a power of two that leads to a >> > number of PGs per OSD between 100-200. >> > >> > -- >> > May the most significant bit of your life be positive. >> > >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx