Re: How to ... alertmanager and prometheus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Welcome Eugen,

There are some ongoing efforts to make the whole prometheus stack config
more dynamic by using the http sd configuration [1]. In fact part of the
changes are already in main but they will not be available till the next
Ceph official release.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config
<https://prometheus.io/docs/prometheus/2.28/configuration/configuration/#http_sd_config>


On Tue, Nov 8, 2022 at 4:47 PM Eugen Block <eblock@xxxxxx> wrote:

> I somehow missed the HA part in [1], thanks for pointing that out.
>
>
> Zitat von Redouane Kachach Elhichou <rkachach@xxxxxxxxxx>:
>
> > If you are running quincy and using cephadm then you can have more
> > instances of prometheus (and other monitoring daemons) running in HA mode
> > by increasing the number of daemons as in [1]:
> >
> > from a cephadm shell (to run 2 instances of prometheus and
> altertmanager):
> >> ceph orch apply prometheus --placement 'count:2'
> >> ceph orch apply alertmanager --placement 'count:2'
> >
> > You can have as many instances as you need. You can choose on which nodes
> > to place them by using the daemon placement specification of cephadm [2]
> by
> > using a specific label for monitoring i.e. In case of mgr failover
> cephadm
> > should reconfigure the daemons accordingly.
> >
> > [1]
> >
> https://docs.ceph.com/en/quincy/cephadm/services/monitoring/#deploying-monitoring-with-cephadm
> > [2] https://docs.ceph.com/en/quincy/cephadm/services/#daemon-placement
> >
> > Hope it helps,
> > Redouane.
> >
> >
> >
> >
> > On Tue, Nov 8, 2022 at 3:58 PM Eugen Block <eblock@xxxxxx> wrote:
> >
> >> Hi,
> >>
> >> the only information I found so far was this statement from the redhat
> >> docs [1]:
> >>
> >> > When multiple services of the same type are deployed, a
> >> > highly-available setup is deployed.
> >>
> >> I tried to do that in a virtual test environment (16.2.7) and it seems
> >> to work as expected.
> >>
> >> ses7-host1:~ # ceph orch ps --daemon_type prometheus
> >> NAME                   HOST        PORTS   STATUS           REFRESHED
> >> AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
> >> prometheus.ses7-host1  ses7-host1          running (6h)       12s ago
> >> 12M     165M        -  2.18.0   8eb9f2694232  04a0b33e2474
> >> prometheus.ses7-host2  ses7-host2  *:9095  host is offline    89s ago
> >>   6h     236M        -           8eb9f2694232  0cb070cea4eb
> >>
> >> host2 was the active mgr before I shut it down, but I still have
> >> access to prometheus metrics as well as active alerts from
> >> alertmanager, there's also one spare instance running, the same
> >> applies for grafana:
> >>
> >> ses7-host1:~ # ceph orch ps --daemon_type alertmanager
> >> NAME                     HOST        PORTS        STATUS
> >> REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
> >> alertmanager.ses7-host1  ses7-host1               running (6h)
> >> 42s ago  12M    33.7M        -  0.16.2   903e9b49157e  5a4ffc9a79da
> >> alertmanager.ses7-host2  ses7-host2  *:9093,9094  running (102s)
> >> 44s ago   6h    35.5M        -           903e9b49157e  71ac3c636a6b
> >>
> >> ses7-host1:~ # ceph orch ps --daemon_type prometheus
> >> NAME                   HOST        PORTS   STATUS          REFRESHED
> >> AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
> >> prometheus.ses7-host1  ses7-host1          running (6h)      44s ago
> >> 12M     156M        -  2.18.0   8eb9f2694232  04a0b33e2474
> >> prometheus.ses7-host2  ses7-host2  *:9095  running (104s)    47s ago
> >> 6h     250M        -           8eb9f2694232  87a5a8349f05
> >>
> >> ses7-host1:~ # ceph orch ps --daemon_type grafana
> >> NAME                HOST        PORTS   STATUS          REFRESHED  AGE
> >>   MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
> >> grafana.ses7-host1  ses7-host1          running (6h)      47s ago  12M
> >>     99.6M        -  7.1.5    31b52dc794e2  7935ecf47b38
> >> grafana.ses7-host2  ses7-host2  *:3000  running (107s)    49s ago   6h
> >>      108M        -  7.1.5    31b52dc794e2  17dea034bb33
> >>
> >> I just specified two hosts in the placement section of each service
> >> and deployed them. I think this should be mentioned in the ceph docs
> >> (not only redhat).
> >>
> >> [1]
> >>
> >>
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/operations_guide/management-of-monitoring-stack-using-the-ceph-orchestrator
> >>
> >> Zitat von Michael Lipp <mnl@xxxxxx>:
> >>
> >> > Hi,
> >> >
> >> > I've just setup a test cluster with cephadm using quincy. Things
> >> > work nicely. However, I'm not sure how to "handle" alertmanager and
> >> > prometheus.
> >> >
> >> > Both services obviously aren't crucial to the working of the
> >> > storage, fine. But there seems to be no built-in fall-over concept.
> >> >
> >> > By default, the active mgr accesses the services using
> >> > host.containers.local, thus assuming that they run an the same
> >> > machine as the active manager. This assumption is true after the
> >> > initial installation. Turning off the host with the active manager
> >> > activates the stand-by on another machine, but alertmanager and
> >> > prometheus are gone (i.e. not "moved along"). So the active manager
> >> > produces lots of error messages when logging into it. Turning the
> >> > tuned-off machine on again doesn't help, because alertmanager and
> >> > prometheus are back, but on the wrong machine.
> >> >
> >> > I couldn't find anything in the documentation. Are alertmanager and
> >> > prometheus supposed to run in some HA-VM? Then I could add the HA-VM
> >> > to the cluster with (only) these two services running on it and make
> >> > the URIs point to this HA-VM (ceph dashboard
> >> > set-alertmanager-api-host ..., ceph dashboard set-grafana-api-url
> >> > ...,  ceph dashboard set-prometheus-api-host...).
> >> >
> >> > How is this supposed to be configured?
> >> >
> >> >  - Michael
> >> >
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux