Hi,
On 11/4/22 09:45, Adrian Nicolae wrote:
Hi,
We have a Pacific cluster (16.2.4) with 30 servers and 30 osds. We
started to increase the pg_num for the data bucket for more than a
month, I usually added 64 pgs in every step I didn't have any issue.
The cluster was healthy before increasing the pgs.
Today I've added 128 pgs and the cluster is stuck with some unknown
pgs and some other in peering state. I've restarted a few osds with
slow_ops and even a few hosts but it didn't change anything. We don't
have any networking issue . Do you have any suggestion ? Our service
is completely down ...
*snipsnap*
Do some of the OSDs exceed the PGs per OSD limit? If this is the case,
the affected OSDs will not allow peering, and tI/O to that OSDs will be
stuck.
You can check the number of PGs in the 'ceph osd df tree' output. To
solve this problem you can increase the limit e.g. by setting
'osd.mon_max_pg_per_osd' in 'ceph config'. The default limit is 200 AFAIK.
Regards,
Burkhard
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx