Hi Oebele, If your main data pool currently has 128 PGs correct increasing the count of PGs should greatly help. The suggested count is about 100 PGs per OSD, so 32 OSDs at 128 PGs total for the data pool is well below that threshold. If you're to increase the count of PGs this will additionally increase the amount of locations objects can be stored, and thus make balancing of the OSDs much better across the cluster. Not only is it important that the data is well balanced, but as well the OMAP data and the metadata is balanced too. > - Is this change likely to solve the issue with the stuck PGs and over-utilized OSD? Not instantly, but it should help with the over-all recovery process, giving more places for data to be placed therefore more concurrent recovery I/O if your recovery settings allow it. > - What should we expect w.r.t. load on the cluster? That would depend on your settings for recovery and backfill, you can limit this to be slower so there isn't much noticeable I/O loss to clients. > - Do the 1024 PGs in xxx-pool have any influence given they are empty? Not necessarily, though they are a waste of system usage as Ceph still has to consider them for operation, but it should be nothing terrible or cluster breaking. Once the recovery is done it would be best to remove or lower the PGs on that pool. >-----Original Message----- >From: Oebele Drijfhout <oebele.drijfhout@xxxxxxxxx> >Sent: September 3, 2022 3:33 PM >To: ceph-users@xxxxxxx >Subject: Re: low available space due to unbalanced cluster(?) > >I found something that I think could be interesting (please remember I'm new to Ceph :) > >There are 3 pools in the cluster: >[xxx@ceph02 ~]$ sudo ceph --cluster xxx osd pool ls xxx-pool foo_data foo_metadata > >xxx-pool is empty, contains no data, but has the bulk of the pgs: >[xxx@ceph02 ~]$ sudo ceph --cluster xxx osd pool get xxx-pool pg_num >pg_num: 1024 > >The other two pools which contain the bulk of the data, have the default number of PGs: >[xxx@ceph02 ~]$ sudo ceph --cluster xxx osd pool get foo_metadata pg_num >pg_num: 128 >[xxx@ceph02 ~]$ sudo ceph --cluster xxx osd pool get foo_data pg_num >pg_num: 128 > >According to the manual ><https://docs.ceph.com/en/nautilus/rados/operations/placement-groups/>, >with 10-50 OSDs, pg_num and pgp_num should be set to 1024 and it's best to increase in steps 128 -> 256 -> 512 -> 1024. If your main data pool currently has 128 PGs correct increasing the count of PGs should greatly help. The suggested count is about 100 PGs per OSD, so 32 OSDs at 128 PGs total for the data pool is well below that threshold. If you're to increase the count of PGs this will additionally increase the amount of locations objects can be stored, and thus make balancing of the OSDs much better across the cluster. Not only is it important that the data is well balanced, but as well the OMAP data and the metadata is balanced too. >- Is this change likely to solve the issue with the stuck PGs and over-utilized OSD? Not instantly, but it should help with the over-all recovery process, giving more places for data to be placed therefore more concurrent recovery I/O if your recovery settings allow it. >- What should we expect w.r.t. load on the cluster? That would depend on your settings for recovery and backfill, you can limit this to be slower so there isn't much noticeable I/O loss to clients. >- Do the 1024 PGs in xxx-pool have any influence given they are empty? Not necessarily, though they are a waste of system usage as Ceph still has to consider them for operation, but it should be nothing terrible or cluster breaking. Once the recovery is done it would be best to remove or lower the PGs on that pool. Regards, Bailey _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx