Re: Monitoring ceph cluster

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Wed, 26 Jan 2022 17:05:23 -0800

What David said!

A couple of additional thoughts:

o Nagios (and derivatives like Icinga and check_mk) have been popular for years.  Note that they’re monitoring solutions vs metrics solutions — it’s good to have both.  One issue I’ve seen multiple times with Nagios-family monitoring is that over time as checks and the fleet grow, the server tends to bog down, and the litany of active checks starts taking longer to run than the check interval.  Prometheus alertmanager is more scalable, and with some thought most active checks can be recast in terms of metrics.

o Prometheus (forked node_exporter) was INVALUABLE to me when characterizing and engaging two seperate SSD firmware design flaw issues. It includes a data query interface for ad-hoc queries and expression development

o Grafana pairs well with Prometheus for dashboard-style visualization and trending across many clusters / nodes

> On Jan 26, 2022, at 1:22 PM, David Orman <ormandj@xxxxxxxxxxxx> wrote:
> 
> What version of Ceph are you using? Newer versions deploy a dashboard and
> prometheus module, which has some of this built in. It's a great start to
> seeing what can be done using Prometheus and the built in exporter. Once
> you learn this, if you decide you want something more robust, you can do an
> external deployment of Prometheus (clusters), Alertmanager, Grafana, and
> all the other tooling that might interest you for a more scalable solution
> when dealing with more clusters. It's the perfect way to get your feet wet
> and it showcases a lot of the interesting things you can do with this
> solution!
> 
> https://docs.ceph.com/en/latest/mgr/dashboard/
> https://docs.ceph.com/en/latest/mgr/prometheus/
> 
> David
> 
> On Wed, Jan 26, 2022 at 1:42 AM Michel Niyoyita <micou12@xxxxxxxxx> wrote:
> 
>> Thank you for your email Szabo, these can be helpful , can you provide
>> links then I start to work on it.
>> 
>> Michel.
>> 
>> On Tue, 25 Jan 2022, 18:51 Szabo, Istvan (Agoda), <Istvan.Szabo@xxxxxxxxx>
>> wrote:
>> 
>>> Which monitoring tool? Like prometheus or nagios style thing?
>>> We use sensu for keepalive and ceph health reporting + prometheus with
>>> grafana for metrics collection.
>>> 
>>> Istvan Szabo
>>> Senior Infrastructure Engineer
>>> ---------------------------------------------------
>>> Agoda Services Co., Ltd.
>>> e: istvan.szabo@xxxxxxxxx
>>> ---------------------------------------------------
>>> 
>>> On 2022. Jan 25., at 22:38, Michel Niyoyita <micou12@xxxxxxxxx> wrote:
>>> 
>>> Email received from the internet. If in doubt, don't click any link nor
>>> open any attachment !
>>> ________________________________
>>> 
>>> Hello team,
>>> 
>>> I would like to monitor my ceph cluster using one of the
>>> monitoring tool, does someone has a help on that ?
>>> 
>>> Michel
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> 
>>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx