Re: RFI: Prometheus, Etc, Services - Optimum Number To Run

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The "right" way to do this is to not run your metrics system on the cluster you want to monitor. Use the provided metrics via the exporter and ingest them using your own system (ours is Mimir/Loki/Grafana + related alerting), so if you have failures of nodes/etc you still have access to, at a minimum, your metrics/log data and alerting. Using the built-in services is a great stop-gap, but in my opinion, should not be relied on for production operation of Ceph clusters (or any software, for that matter.) Spin up some VMs if that's what you have available to you and manage your LGTM (or other choice) externally.

Cheers,
David

On Fri, Jan 19, 2024, at 23:42, duluxoz wrote:
> Hi All,
>
> In regards to the monitoring services on a Ceph Cluster (ie Prometheus, 
> Grafana, Alertmanager, Loki, Node-Exported, Promtail, etc) how many 
> instances should/can we run for fault tolerance purposes? I can't seem 
> to recall that advice being in the doco anywhere (but of course, I 
> probably missed it).
>
> I'm concerned about HA on those services - will they continue to run if 
> the Ceph Node they're on fails?
>
> At the moment we're running only 1 instance of each in the cluster, but 
> several Ceph Nodes are capable of running each - ie/eg 3 nodes 
> configured but only count:1.
>
> This is on the latest version of Reef using cephadmin (if it makes a 
> huge difference :-) ).
>
> So any advice, etc, would be greatly appreciated, including if we should 
> be running any services not mentioned (not Mgr, Mon, OSD, or iSCSI, 
> obviously :-) )
>
> Cheers
>
> Dulux-Oz
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux