Hi all,
we have problem on our production cluster running nautilus (14.2.22).
Cluster is almost full and few month ago we noticed issues with slow peering - when we restart any osd (or host) it takes hours to finish
peering process, instead of minutes.
We noticed, that some pool contains 90k in 300GB objects per PG, so we decided to increase pg_num on that pool so individual PG is peered
quickly. During that state we got into stuck PG inactive for hours and peering not finised, and some OSD went down with this error
https://tracker.ceph.com/issues/51168
We decided to restart all osds and waiting, but problem with slow peering persists.
Is there any way how to get cluster healthy? Or disable peering of some pool so other pools with RBD images get peered and get online and
after that try to peer that big pool?
Thank you for help, it is urgent situation
With regards
Jan Pekar
--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
https://www.imatic.cz | +420326555326
============
--
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx