> We use the restful API for monitoring (using the Ceph for Zabbix Agent 2 > plugin, as Zabbix is the over-arching monitoring platform in the data > centre) Chris, just FYI: the "restful" mgr module was deprecated 4 years ago [1] and will be removed in v20 (Tentacle). [2] Something similar will happen with the "zabbix" mgr module. [3] [1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/LBKLNXH7UQL7TLFU5G52Y2SYVME4RS6P/ [2] https://github.com/ceph/ceph/pull/57299 [3] https://docs.ceph.com/en/squid/mgr/zabbix/#zabbix-module Kind Regards, Ernesto On Tue, Nov 5, 2024 at 9:19 PM Laura Flores <lflores@xxxxxxxxxx> wrote: > Thanks for sending this in, Chris. > > On Fri, Nov 1, 2024 at 9:47 AM Chris Palmer <chris.palmer@xxxxxxxxx> > wrote: > > > Hi Laura > > > > Logged as https://tracker.ceph.com/issues/68801 It is pretty much as I > > thought, except that the requests are not actually being lost while the > > balancer is running, but are delayed until it has finished (by which time > > the client has timed out). Anything more I can do just let me know. > > Regards, Chris > > On 31/10/2024 15:12, Laura Flores wrote: > > > > Hi Chris, > > > > As other users have pointed out, we are fixing an issue tracked > inhttps://tracker.ceph.com/issues/68657 that seems related to what you're > > experiencing. However, can you raise a new tracker describing your > problem > > so we can confirm? > > > > Can you please include: > > 1. Steps to reproduce (including any commands you are performing to > invoke > > the restful api) > > 2. MGR logs with `ceph config set mgr.* debug_mgr 20` and `ceph config > set > > mgr mgr/balancer/log_level debug` > > > > Thanks, > > Laura > > > > On Wed, Oct 30, 2024 at 7:24 AM Chris Palmer <chris.palmer@xxxxxxxxx> < > chris.palmer@xxxxxxxxx> wrote: > > > > > > I've just upgraded a test cluster from 18.2.4 to 19.2.0. Package > > install on centos 9 stream. Very smooth upgrade. Only one problem so > far... > > > > The MGR restful api calls work fine. EXCEPT whenever the balancer kicks > > in to find any new plans. During the few seconds that the balancer takes > > to run, all REST calls seem to be completely dropped. The MGR log file > > normally logs the POST requests, but the ones during these few seconds > > don't appear at all. This causes our monitoring to keep raising alarms. > > > > The cluster is in a completely stable state, HEALTH_OK, very little > > activity, just the occasional scrubs. > > > > We use the restful API for monitoring (using the Ceph for Zabbix Agent 2 > > plugin, as Zabbix is the over-arching monitoring platform in the data > > centre). I haven't yet checked the memory leak problems that we (like > > many) were having, because I have been chasing this new problem. > > > > The problem is quite repeatable. To diagnose I use the zabbix_get > > utility to query every second. Whenever the MGR log file shows the > > balancer kick in the REST requests time out (after 3 seconds - not sure > > whether the utility or the MGR is timing them out - I suspect the > > utility). They normally complete after a small fraction of a second. > > With the balancer disabled the REST interface works reliably again. > > > > The problem does not occur pre-squid. > > > > Anyone any ideas, or shall I raise a bug? > > > > Thanks, Chris > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > -- > > Laura Flores > > She/Her/Hers > > Software Engineer, Ceph Storage <https://ceph.io> > > Chicago, IL > > lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx> > M: +17087388804 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx