Re: Monitoring ceph and prometheus

John Spray <jspray@xxxxxxxxxx> · Thu, 18 May 2017 10:03:25 +0100

On Thu, May 18, 2017 at 9:37 AM, Lars Marowsky-Bree <lmb@xxxxxxxx> wrote:
> On 2017-05-15T13:33:29, John Spray <jspray@xxxxxxxxxx> wrote:
>
>> At the risk of being a bit picky, it's only redundant if prometheus is
>> the only thing consuming them.  If the user is also using some mgr
>> modules (including things like handy CLI views) that consume the
>> stats, it's not redundant at all.  I'd like to keep these stats around
>> in the mgr because we're not quite sure yet what kinds of modules
>> we'll end up with.
>
> Fair enough. The point that they may wish to gather information at
> different frequencies still remains though - a ceph-mgr module may do it
> on-demand for certain tasks, event driven, or periodically, prometheus
> (or other trending) would want to poll certain counters at various
> frequencies, etc.

I'm slightly getting the impression that you might not have noticed
the existing functionality here -- the perf counters are sent
continuously from the daemons to the mgr, rather than being polled.
The mgr is in control of how often that is (via the MMgrConfigure
message).

>
> (e.g., maybe the OSD ones every 10s, SMART every 3h, whatever)

To be clear, when I talk about stats I'm talking about the perf
counters -- if SMART monitoring is added at some stage then I would
imagine sending that using a different mechanism.  As you say, sending
SMART counters at the same frequency as normal perf counters wouldn't
make sense.

>
> Aligning these would be annoying, and it seems to me that it makes more
> sense to allow them to poll independently from the same interfaces.
>
>> Sage's recent change to add the importance thresholds to perf counters
>> could be interesting here: we might end up sending everything that's
>> "reasonably important" and higher to the mgr for exposing in CLI tools
>> etc (I'm thinking of things like the OSD throughput, the MDS number of
>> each op per second, etc), while perhaps the really obscure stuff would
>> only get collected (into prometheus?) if someone actively chose that.
>
> That's actually somewhat related to how smart classifies. Value,
> threshold, type (old-age, pre-fail, we could add a "perf" one).
>
> I take the point - there's also a need for an event-driven channel that
> needs to be push by default. (From simple operation completion
> notification to "OMFG the disk caught fire.")

Again, that would be something separate from the existing perf counter
functionality.

> I could see those going to ceph-mgr for handling/relaying.

Yep.

>
>> > So, perhaps exposing this - the dynamic service/target discovery via
>> > ceph-mgr to Prometheus, and then having Prometheus pull directly - is a
>> > synthesis of both positions?
>> It would certainly be ++good build in the service discovery so that
>> the user only needs to point prometheus at one place to discover
>> everything.  Anything that avoids the need for extra external tools to
>> set things up makes me happy.
>
> Yes, I think that'd be great to have. And at least in my head the idea
> of where information goes becomes clearer.
>
> Notifications/events go to and through ceph-mgr. ceph-mgr keeps track of
> Ceph services. Trending/metrics should IMNSHO be polled directly as
> needed.

I'm not opposed to having a polling interface there if you want to add
it -- it could be useful for anyone who chooses to turn off the
existing stats transmission.  However, we should be mindful that it
will complicate the lives of plugin authors if they are uncertain
about whether they're running on a polling-configured (reading stats
is a network op) or a streaming-configured system (reading stats super
fast).

John

>
>
> Regards,
>     Lars
>
> --
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html