Hi,
Laura posted [0],[1] two days ago that she likely found the root cause
of the balancer crashing the MGR. It sounds like what you're
describing could be related to that.
[0]
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/STR2UCS2KDZQAXOLH3GPCCWN4GBR3CJG/
[1] https://tracker.ceph.com/issues/68657
Zitat von Chris Palmer <chris.palmer@xxxxxxxxx>:
I've just upgraded a test cluster from 18.2.4 to 19.2.0. Package
install on centos 9 stream. Very smooth upgrade. Only one problem so
far...
The MGR restful api calls work fine. EXCEPT whenever the balancer
kicks in to find any new plans. During the few seconds that the
balancer takes to run, all REST calls seem to be completely dropped.
The MGR log file normally logs the POST requests, but the ones
during these few seconds don't appear at all. This causes our
monitoring to keep raising alarms.
The cluster is in a completely stable state, HEALTH_OK, very little
activity, just the occasional scrubs.
We use the restful API for monitoring (using the Ceph for Zabbix
Agent 2 plugin, as Zabbix is the over-arching monitoring platform in
the data centre). I haven't yet checked the memory leak problems
that we (like many) were having, because I have been chasing this
new problem.
The problem is quite repeatable. To diagnose I use the zabbix_get
utility to query every second. Whenever the MGR log file shows the
balancer kick in the REST requests time out (after 3 seconds - not
sure whether the utility or the MGR is timing them out - I suspect
the utility). They normally complete after a small fraction of a
second. With the balancer disabled the REST interface works reliably
again.
The problem does not occur pre-squid.
Anyone any ideas, or shall I raise a bug?
Thanks, Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx