Re: Squid 19.2.0 balancer causes restful requests to be lost

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ernesto

OK, well that puts a whole different spin on the problem. I'd not seen that (it slightly predates my involvement with ceph), and I was just using the plugin that comes with zabbix so hadn't had any cause to look more deeply at the restful api. (Note that plugin is completely different from the zabbix module supplied by ceph, which I discounted as I knew it was deprecated).

It sounds as though the right path would be for the zabbix-supplied ceph plugin to be reworked to use the newer ceph dashboard api, if that is possible. I might take a look at the api and plugin to see what would be involved.

Regards, Chris



On 06/11/2024 12:48, Ernesto Puerta wrote:
> We use the restful API for monitoring (using the Ceph for Zabbix Agent 2
> plugin, as Zabbix is the over-arching monitoring platform in the data
> centre)

Chris, just FYI: the "restful" mgr module was deprecated 4 years ago [1] and will be removed in v20 (Tentacle). [2] Something similar will happen with the "zabbix" mgr module. [3]

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/LBKLNXH7UQL7TLFU5G52Y2SYVME4RS6P/
[2] https://github.com/ceph/ceph/pull/57299
[3] https://docs.ceph.com/en/squid/mgr/zabbix/#zabbix-module

Kind Regards,
Ernesto


On Tue, Nov 5, 2024 at 9:19 PM Laura Flores <lflores@xxxxxxxxxx> wrote:

    Thanks for sending this in, Chris.

    On Fri, Nov 1, 2024 at 9:47 AM Chris Palmer
    <chris.palmer@xxxxxxxxx> wrote:

    > Hi Laura
    >
    > Logged as https://tracker.ceph.com/issues/68801 It is pretty
    much as I
    > thought, except that the requests are not actually being lost
    while the
    > balancer is running, but are delayed until it has finished (by
    which time
    > the client has timed out). Anything more I can do just let me know.
    > Regards, Chris
    > On 31/10/2024 15:12, Laura Flores wrote:
    >
    > Hi Chris,
    >
    > As other users have pointed out, we are fixing an issue tracked
    inhttps://tracker.ceph.com/issues/68657
    <http://tracker.ceph.com/issues/68657> that seems related to what
    you're
    > experiencing. However, can you raise a new tracker describing
    your problem
    > so we can confirm?
    >
    > Can you please include:
    > 1. Steps to reproduce (including any commands you are performing
    to invoke
    > the restful api)
    > 2. MGR logs with `ceph config set mgr.* debug_mgr 20` and `ceph
    config set
    > mgr mgr/balancer/log_level debug`
    >
    > Thanks,
    > Laura
    >
    > On Wed, Oct 30, 2024 at 7:24 AM Chris Palmer
    <chris.palmer@xxxxxxxxx> <chris.palmer@xxxxxxxxx> wrote:
    >
    >
    > I've just upgraded a test cluster from 18.2.4 to 19.2.0. Package
    > install on centos 9 stream. Very smooth upgrade. Only one
    problem so far...
    >
    > The MGR restful api calls work fine. EXCEPT whenever the
    balancer kicks
    > in to find any new plans. During the few seconds that the
    balancer takes
    > to run, all REST calls seem to be completely dropped. The MGR
    log file
    > normally logs the POST requests, but the ones during these few
    seconds
    > don't appear at all. This causes our monitoring to keep raising
    alarms.
    >
    > The cluster is in a completely stable state, HEALTH_OK, very little
    > activity, just the occasional scrubs.
    >
    > We use the restful API for monitoring (using the Ceph for Zabbix
    Agent 2
    > plugin, as Zabbix is the over-arching monitoring platform in the
    data
    > centre). I haven't yet checked the memory leak problems that we
    (like
    > many) were having, because I have been chasing this new problem.
    >
    > The problem is quite repeatable. To diagnose I use the zabbix_get
    > utility to query every second. Whenever the MGR log file shows the
    > balancer kick in the REST requests time out (after 3 seconds -
    not sure
    > whether the utility or the MGR is timing them out - I suspect the
    > utility). They normally complete after a small fraction of a second.
    > With the balancer disabled the REST interface works reliably again.
    >
    > The problem does not occur pre-squid.
    >
    > Anyone any ideas, or shall I raise a bug?
    >
    > Thanks, Chris
    > _______________________________________________
    > ceph-users mailing list -- ceph-users@xxxxxxx
    > To unsubscribe send an email to ceph-users-leave@xxxxxxx
    >
    >
    >

--
    Laura Flores

    She/Her/Hers

    Software Engineer, Ceph Storage <https://ceph.io>

    Chicago, IL

    lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx>
    M: +17087388804
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux