Re: Manager carries wrong information until killing it

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> · Wed, 12 May 2021 23:13:03 +0200

Reed Dier <reed.dier@xxxxxxxxxxx> writes:

> I don't have a solution to offer, but I've seen this for years with no solution.
> Any time a MGR bounces, be it for upgrades, or a new daemon coming online, etc, I'll see a scale spike like is reported below.

Interesting to read that we are not the only ones.

> Just out of curiosity, which MGR plugins are you using?

[22:11:05] black2.place6:~# ceph mgr module ls
{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator_cli",
        "progress",
        "rbd_support",
        "status",
        "volumes"
    ],
    "enabled_modules": [
        "iostat",
        "pg_autoscaler",
        "prometheus",
        "restful"
    ],

> I have historically used the influx plugin for stats exports, and it shows up in those values as well, throwing everything off.

So the problem is unlikely related to the prometheus plugin, but more to
a statistics error somewhere else.

> I don't see it in my Zabbix stats, albeit those are scraped at a
> longer interval that may not catch this.

For prometheus, we scrape every 10 or 15 seconds. But I wonder if this
really flattens out or whether the logic is actually different.

Out of curiosity from my side: the manager is a binary, but the plugins
are actually python modules. I had a quick look at
/usr/share/ceph/mgr/prometheus/module.py which seems to get the data
from a monitor - so I wonder if the problem lies more in the
architecture of ceph rather than the actual data export.

Cheers,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx