On Mon, 10 Jul 2017, Ruben Kerkhof wrote:
> On Mon, Jul 10, 2017 at 7:44 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> > On Mon, 10 Jul 2017, Gregory Farnum wrote:
> >> On Mon, Jul 10, 2017 at 12:57 AM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >> I need a little help with fixing some errors I am having.
> >>
> >> After upgrading from Kraken im getting incorrect values reported
> >> on
> >> placement groups etc. At first I thought it is because I was
> >> changing
> >> the public cluster ip address range and modifying the monmap
> >> directly.
> >> But after deleting and adding a monitor this ceph daemon dump is
> >> still
> >> incorrect.
> >>
> >>
> >>
> >>
> >> ceph daemon mon.a perf dump cluster
> >> {
> >> "cluster": {
> >> "num_mon": 3,
> >> "num_mon_quorum": 3,
> >> "num_osd": 6,
> >> "num_osd_up": 6,
> >> "num_osd_in": 6,
> >> "osd_epoch": 3842,
> >> "osd_bytes": 0,
> >> "osd_bytes_used": 0,
> >> "osd_bytes_avail": 0,
> >> "num_pool": 0,
> >> "num_pg": 0,
> >> "num_pg_active_clean": 0,
> >> "num_pg_active": 0,
> >> "num_pg_peering": 0,
> >> "num_object": 0,
> >> "num_object_degraded": 0,
> >> "num_object_misplaced": 0,
> >> "num_object_unfound": 0,
> >> "num_bytes": 0,
> >> "num_mds_up": 1,
> >> "num_mds_in": 1,
> >> "num_mds_failed": 0,
> >> "mds_epoch": 816
> >> }
> >>
> >> }
> >>
> >>
> >> Huh, I didn't know that existed.
> >>
> >> So, yep, most of those values aren't updated any more. From a grep, you can
> >> still trust:
> >> num_mon
> >> num_mon_quorum
> >> num_osd
> >> num_osd_up
> >> num_osd_in
> >> osd_epoch
> >> num_mds_up
> >> num_mds_in
> >> num_mds_failed
> >> mds_epoch
> >>
> >> We might be able to keep updating the others when we get reports from the
> >> manager, but it'd be simpler to just rip them out — I don't think the admin
> >> socket is really the right place to get cluster summary data like this.
> >> Sage, any thoughts?
> >
> > These were added to fill a gap when operators are collecting everything
> > via collectd or similar.
>
> Indeed, this has been reported as
> https://github.com/collectd/collectd/issues/2345
>
> > Getting the same cluster-level data from
> > multiple mons is redundant but it avoids having to code up a separate
> > collector that polls the CLI or something.
> >
> > I suspect once we're funneling everything through a mgr module this
> > problem will go away and we can remove this.
>
> That would be great, having collectd running on each monitor always felt
> a bit weird. If anyone wants to contribute patches to the collectd Ceph
> plugin to support the mgr, we would really appreciate that.
To be clear, what we're currently working on right here is a *prometheus*
module/plugin for mgr that will funnel the metrics for *all* ceph daemons
through a single endpoint to prometheus. I suspect we can easily
include the cluster-level stats there.
I'm not sure what the situation looks like with collectd or if there is
any interest or work with making mgr behavior like a proxy for all
of the cluster and daemon stats.
> > Until then, these are easy
> > to fix by populating from PGMapDigest... my vote is we do that!
>
> Yes please :)
I've added a ticket for luminous:
http://tracker.ceph.com/issues/20563
sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com