I've just upgraded a test cluster from 18.2.4 to 19.2.0. Package
install on centos 9 stream. Very smooth upgrade. Only one problem so far...
The MGR restful api calls work fine. EXCEPT whenever the balancer kicks
in to find any new plans. During the few seconds that the balancer takes
to run, all REST calls seem to be completely dropped. The MGR log file
normally logs the POST requests, but the ones during these few seconds
don't appear at all. This causes our monitoring to keep raising alarms.
The cluster is in a completely stable state, HEALTH_OK, very little
activity, just the occasional scrubs.
We use the restful API for monitoring (using the Ceph for Zabbix Agent 2
plugin, as Zabbix is the over-arching monitoring platform in the data
centre). I haven't yet checked the memory leak problems that we (like
many) were having, because I have been chasing this new problem.
The problem is quite repeatable. To diagnose I use the zabbix_get
utility to query every second. Whenever the MGR log file shows the
balancer kick in the REST requests time out (after 3 seconds - not sure
whether the utility or the MGR is timing them out - I suspect the
utility). They normally complete after a small fraction of a second.
With the balancer disabled the REST interface works reliably again.
The problem does not occur pre-squid.
Anyone any ideas, or shall I raise a bug?
Thanks, Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx