Re: Problems with statistics after upgrade to luminous

Ruben Kerkhof <ruben@xxxxxxxxxxxxxxxx> · Mon, 10 Jul 2017 21:49:06 +0200

On Mon, Jul 10, 2017 at 7:44 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Mon, 10 Jul 2017, Gregory Farnum wrote:
>> On Mon, Jul 10, 2017 at 12:57 AM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:
>>
>>       I need a little help with fixing some errors I am having.
>>
>>       After upgrading from Kraken im getting incorrect values reported
>>       on
>>       placement groups etc. At first I thought it is because I was
>>       changing
>>       the public cluster ip address range and modifying the monmap
>>       directly.
>>       But after deleting and adding a monitor this ceph daemon dump is
>>       still
>>       incorrect.
>>
>>
>>
>>
>>       ceph daemon mon.a perf dump cluster
>>       {
>>           "cluster": {
>>               "num_mon": 3,
>>               "num_mon_quorum": 3,
>>               "num_osd": 6,
>>               "num_osd_up": 6,
>>               "num_osd_in": 6,
>>               "osd_epoch": 3842,
>>               "osd_bytes": 0,
>>               "osd_bytes_used": 0,
>>               "osd_bytes_avail": 0,
>>               "num_pool": 0,
>>               "num_pg": 0,
>>               "num_pg_active_clean": 0,
>>               "num_pg_active": 0,
>>               "num_pg_peering": 0,
>>               "num_object": 0,
>>               "num_object_degraded": 0,
>>               "num_object_misplaced": 0,
>>               "num_object_unfound": 0,
>>               "num_bytes": 0,
>>               "num_mds_up": 1,
>>               "num_mds_in": 1,
>>               "num_mds_failed": 0,
>>               "mds_epoch": 816
>>           }
>>
>>       }
>>
>>
>> Huh, I didn't know that existed.
>>
>> So, yep, most of those values aren't updated any more. From a grep, you can
>> still trust:
>> num_mon
>> num_mon_quorum
>> num_osd
>> num_osd_up
>> num_osd_in
>> osd_epoch
>> num_mds_up
>> num_mds_in
>> num_mds_failed
>> mds_epoch
>>
>> We might be able to keep updating the others when we get reports from the
>> manager, but it'd be simpler to just rip them out — I don't think the admin
>> socket is really the right place to get cluster summary data like this.
>> Sage, any thoughts?
>
> These were added to fill a gap when operators are collecting everything
> via collectd or similar.

Indeed, this has been reported as
https://github.com/collectd/collectd/issues/2345

> Getting the same cluster-level data from
> multiple mons is redundant but it avoids having to code up a separate
> collector that polls the CLI or something.
>
> I suspect once we're funneling everything through a mgr module this
> problem will go away and we can remove this.

That would be great, having collectd running on each monitor always
felt a bit weird.
If anyone wants to contribute patches to the collectd Ceph plugin to
support the mgr, we would really appreciate that.

> Until then, these are easy
> to fix by populating from PGMapDigest... my vote is we do that!

Yes please :)
>
> sage

Kind regards,

Ruben Kerkhof
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com