I found something that I think could be interesting (please remember I'm new to Ceph :) There are 3 pools in the cluster: [xxx@ceph02 ~]$ sudo ceph --cluster xxx osd pool ls xxx-pool foo_data foo_metadata xxx-pool is empty, contains no data, but has the bulk of the pgs: [xxx@ceph02 ~]$ sudo ceph --cluster xxx osd pool get xxx-pool pg_num pg_num: 1024 The other two pools which contain the bulk of the data, have the default number of PGs: [xxx@ceph02 ~]$ sudo ceph --cluster xxx osd pool get foo_metadata pg_num pg_num: 128 [xxx@ceph02 ~]$ sudo ceph --cluster xxx osd pool get foo_data pg_num pg_num: 128 According to the manual <https://docs.ceph.com/en/nautilus/rados/operations/placement-groups/>, with 10-50 OSDs, pg_num and pgp_num should be set to 1024 and it's best to increase in steps 128 -> 256 -> 512 -> 1024. - Is this change likely to solve the issue with the stuck PGs and over-utilized OSD? - What should we expect w.r.t. load on the cluster? - Do the 1024 PGs in xxx-pool have any influence given they are empty? On Sat, Sep 3, 2022 at 11:41 AM Oebele Drijfhout <oebele.drijfhout@xxxxxxxxx> wrote: > Hello Stefan, > > Thank you for your answer. > > On Fri, Sep 2, 2022 at 5:27 PM Stefan Kooman <stefan@xxxxxx> wrote: > >> On 9/2/22 15:55, Oebele Drijfhout wrote: >> > Hello, >> > >> > I'm new to Ceph and I recently inherited a 4 node cluster with 32 OSDs >> and >> > about 116TB raw space, which shows low available space, which I'm >> trying to >> > increase by enabling the balancer and lowering priority for the >> most-used >> > OSDs. My questions are: is what I did correct with the current state of >> the >> > cluster and can I do more to speed up rebalancing and will we actually >> make >> > more space available this way? >> >> Yes. When it's perfectly balanced the average OSD utilization should >> approach %RAW USED. >> >> > The variance between OSD utilization has been going down during the night, > but I'm worried that we will soon hit 95% full on osd.11. %USE is steadily > going up. What can I do to 1. prevent more data being written to this OSD > and 2. force data off this OSD? > > [truiten@xxxxxxxxxx ~]$ sudo ceph --cluster eun osd df > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL > %USE VAR PGS STATUS > 0 hdd 3.63869 1.00000 3.6 TiB 1.4 TiB 1.4 TiB 907 MiB 3.3 GiB 2.2 TiB > 39.56 0.67 127 up > 1 hdd 3.63869 1.00000 3.6 TiB 1.1 TiB 1.1 TiB 581 MiB 2.5 GiB 2.6 TiB > 29.61 0.50 126 up > 2 hdd 3.63869 1.00000 3.6 TiB 2.2 TiB 2.2 TiB 701 MiB 4.2 GiB 1.5 TiB > 59.56 1.01 125 up > 3 hdd 3.63869 1.00000 3.6 TiB 2.5 TiB 2.5 TiB 672 MiB 5.5 GiB 1.1 TiB > 68.95 1.16 131 up > 4 hdd 3.63869 1.00000 3.6 TiB 2.5 TiB 2.5 TiB 524 MiB 5.7 GiB 1.1 TiB > 68.88 1.16 116 up > 5 hdd 3.63869 0.76984 3.6 TiB 2.7 TiB 2.7 TiB 901 MiB 4.9 GiB 976 GiB > 73.81 1.25 105 up > 6 hdd 3.63869 0.76984 3.6 TiB 2.7 TiB 2.7 TiB 473 MiB 5.0 GiB 972 GiB > 73.90 1.25 99 up > 7 hdd 3.63869 1.00000 3.6 TiB 1.8 TiB 1.8 TiB 647 MiB 3.5 GiB 1.8 TiB > 49.27 0.83 125 up > 8 hdd 3.63869 1.00000 3.6 TiB 1.6 TiB 1.6 TiB 624 MiB 3.1 GiB 2.0 TiB > 44.21 0.75 124 up > 9 hdd 3.63869 1.00000 3.6 TiB 2.4 TiB 2.4 TiB 934 MiB 4.8 GiB 1.3 TiB > 64.76 1.09 121 up > 10 hdd 3.63869 1.00000 3.6 TiB 2.2 TiB 2.1 TiB 525 MiB 4.0 GiB 1.5 TiB > 59.12 1.00 127 up > 11 hdd 3.63869 0.76984 3.6 TiB 3.4 TiB 3.4 TiB 431 MiB 6.2 GiB 239 GiB > 93.59 1.58 84 up <--- > 12 hdd 3.63869 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 777 MiB 4.2 GiB 1.5 TiB > 59.02 1.00 124 up > 13 hdd 3.63869 1.00000 3.6 TiB 1.4 TiB 1.4 TiB 738 MiB 3.2 GiB 2.2 TiB > 39.46 0.67 125 up > 14 hdd 3.63869 1.00000 3.6 TiB 2.2 TiB 2.1 TiB 560 MiB 6.2 GiB 1.5 TiB > 59.11 1.00 122 up > 15 hdd 3.63869 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 575 MiB 4.4 GiB 1.5 TiB > 59.06 1.00 123 up > 16 hdd 3.63869 1.00000 3.6 TiB 1.8 TiB 1.8 TiB 625 MiB 3.4 GiB 1.8 TiB > 49.24 0.83 124 up > 17 hdd 3.63869 0.76984 3.6 TiB 2.7 TiB 2.7 TiB 696 MiB 5.1 GiB 958 GiB > 74.28 1.25 93 up > 18 hdd 3.63869 1.00000 3.6 TiB 2.0 TiB 2.0 TiB 210 MiB 3.6 GiB 1.6 TiB > 54.94 0.93 125 up > 19 hdd 3.63869 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 504 MiB 5.2 GiB 1.5 TiB > 59.08 1.00 124 up > 20 hdd 3.63869 1.00000 3.6 TiB 2.5 TiB 2.5 TiB 796 MiB 5.0 GiB 1.1 TiB > 69.14 1.17 121 up > 21 hdd 3.63869 1.00000 3.6 TiB 1.6 TiB 1.6 TiB 679 MiB 3.6 GiB 2.0 TiB > 44.25 0.75 123 up > 22 hdd 3.63869 1.00000 3.6 TiB 2.5 TiB 2.5 TiB 682 MiB 4.9 GiB 1.1 TiB > 68.86 1.16 125 up > 23 hdd 3.63869 1.00000 3.6 TiB 1.6 TiB 1.6 TiB 575 MiB 3.1 GiB 2.0 TiB > 44.83 0.76 124 up > 24 hdd 3.63869 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 517 MiB 3.9 GiB 1.5 TiB > 59.00 1.00 125 up > 25 hdd 3.63869 1.00000 3.6 TiB 2.2 TiB 2.1 TiB 836 MiB 4.5 GiB 1.5 TiB > 59.12 1.00 121 up > 26 hdd 3.63869 1.00000 3.6 TiB 2.5 TiB 2.5 TiB 520 MiB 5.0 GiB 1.1 TiB > 69.31 1.17 109 up > 27 hdd 3.63869 1.00000 3.6 TiB 2.0 TiB 2.0 TiB 861 MiB 3.8 GiB 1.7 TiB > 54.13 0.91 126 up > 28 hdd 3.63869 1.00000 3.6 TiB 1.8 TiB 1.8 TiB 256 MiB 4.3 GiB 1.8 TiB > 49.21 0.83 122 up > 29 hdd 3.63869 1.00000 3.6 TiB 2.7 TiB 2.7 TiB 998 MiB 5.1 GiB 980 GiB > 73.69 1.24 126 up > 30 hdd 3.63869 1.00000 3.6 TiB 2.2 TiB 2.1 TiB 1.1 GiB 4.1 GiB 1.5 TiB > 59.16 1.00 123 up > 31 hdd 3.63869 1.00000 3.6 TiB 2.3 TiB 2.3 TiB 689 MiB 4.4 GiB 1.3 TiB > 64.02 1.08 123 up > TOTAL 116 TiB 69 TiB 69 TiB 21 GiB 140 GiB 48 TiB > 59.19 > MIN/MAX VAR: 0.50/1.58 STDDEV: 12.44 > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx