Thanks for sending this in, Chris. On Fri, Nov 1, 2024 at 9:47 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote: > Hi Laura > > Logged as https://tracker.ceph.com/issues/68801 It is pretty much as I > thought, except that the requests are not actually being lost while the > balancer is running, but are delayed until it has finished (by which time > the client has timed out). Anything more I can do just let me know. > Regards, Chris > On 31/10/2024 15:12, Laura Flores wrote: > > Hi Chris, > > As other users have pointed out, we are fixing an issue tracked inhttps://tracker.ceph.com/issues/68657 that seems related to what you're > experiencing. However, can you raise a new tracker describing your problem > so we can confirm? > > Can you please include: > 1. Steps to reproduce (including any commands you are performing to invoke > the restful api) > 2. MGR logs with `ceph config set mgr.* debug_mgr 20` and `ceph config set > mgr mgr/balancer/log_level debug` > > Thanks, > Laura > > On Wed, Oct 30, 2024 at 7:24 AM Chris Palmer <chris.palmer@xxxxxxxxx> <chris.palmer@xxxxxxxxx> wrote: > > > I've just upgraded a test cluster from 18.2.4 to 19.2.0. Package > install on centos 9 stream. Very smooth upgrade. Only one problem so far... > > The MGR restful api calls work fine. EXCEPT whenever the balancer kicks > in to find any new plans. During the few seconds that the balancer takes > to run, all REST calls seem to be completely dropped. The MGR log file > normally logs the POST requests, but the ones during these few seconds > don't appear at all. This causes our monitoring to keep raising alarms. > > The cluster is in a completely stable state, HEALTH_OK, very little > activity, just the occasional scrubs. > > We use the restful API for monitoring (using the Ceph for Zabbix Agent 2 > plugin, as Zabbix is the over-arching monitoring platform in the data > centre). I haven't yet checked the memory leak problems that we (like > many) were having, because I have been chasing this new problem. > > The problem is quite repeatable. To diagnose I use the zabbix_get > utility to query every second. Whenever the MGR log file shows the > balancer kick in the REST requests time out (after 3 seconds - not sure > whether the utility or the MGR is timing them out - I suspect the > utility). They normally complete after a small fraction of a second. > With the balancer disabled the REST interface works reliably again. > > The problem does not occur pre-squid. > > Anyone any ideas, or shall I raise a bug? > > Thanks, Chris > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > -- Laura Flores She/Her/Hers Software Engineer, Ceph Storage <https://ceph.io> Chicago, IL lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx> M: +17087388804 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx