Welcome Eugen, There are some ongoing efforts to make the whole prometheus stack config more dynamic by using the http sd configuration [1]. In fact part of the changes are already in main but they will not be available till the next Ceph official release. https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config <https://prometheus.io/docs/prometheus/2.28/configuration/configuration/#http_sd_config> On Tue, Nov 8, 2022 at 4:47 PM Eugen Block <eblock@xxxxxx> wrote: > I somehow missed the HA part in [1], thanks for pointing that out. > > > Zitat von Redouane Kachach Elhichou <rkachach@xxxxxxxxxx>: > > > If you are running quincy and using cephadm then you can have more > > instances of prometheus (and other monitoring daemons) running in HA mode > > by increasing the number of daemons as in [1]: > > > > from a cephadm shell (to run 2 instances of prometheus and > altertmanager): > >> ceph orch apply prometheus --placement 'count:2' > >> ceph orch apply alertmanager --placement 'count:2' > > > > You can have as many instances as you need. You can choose on which nodes > > to place them by using the daemon placement specification of cephadm [2] > by > > using a specific label for monitoring i.e. In case of mgr failover > cephadm > > should reconfigure the daemons accordingly. > > > > [1] > > > https://docs.ceph.com/en/quincy/cephadm/services/monitoring/#deploying-monitoring-with-cephadm > > [2] https://docs.ceph.com/en/quincy/cephadm/services/#daemon-placement > > > > Hope it helps, > > Redouane. > > > > > > > > > > On Tue, Nov 8, 2022 at 3:58 PM Eugen Block <eblock@xxxxxx> wrote: > > > >> Hi, > >> > >> the only information I found so far was this statement from the redhat > >> docs [1]: > >> > >> > When multiple services of the same type are deployed, a > >> > highly-available setup is deployed. > >> > >> I tried to do that in a virtual test environment (16.2.7) and it seems > >> to work as expected. > >> > >> ses7-host1:~ # ceph orch ps --daemon_type prometheus > >> NAME HOST PORTS STATUS REFRESHED > >> AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID > >> prometheus.ses7-host1 ses7-host1 running (6h) 12s ago > >> 12M 165M - 2.18.0 8eb9f2694232 04a0b33e2474 > >> prometheus.ses7-host2 ses7-host2 *:9095 host is offline 89s ago > >> 6h 236M - 8eb9f2694232 0cb070cea4eb > >> > >> host2 was the active mgr before I shut it down, but I still have > >> access to prometheus metrics as well as active alerts from > >> alertmanager, there's also one spare instance running, the same > >> applies for grafana: > >> > >> ses7-host1:~ # ceph orch ps --daemon_type alertmanager > >> NAME HOST PORTS STATUS > >> REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID > >> alertmanager.ses7-host1 ses7-host1 running (6h) > >> 42s ago 12M 33.7M - 0.16.2 903e9b49157e 5a4ffc9a79da > >> alertmanager.ses7-host2 ses7-host2 *:9093,9094 running (102s) > >> 44s ago 6h 35.5M - 903e9b49157e 71ac3c636a6b > >> > >> ses7-host1:~ # ceph orch ps --daemon_type prometheus > >> NAME HOST PORTS STATUS REFRESHED > >> AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID > >> prometheus.ses7-host1 ses7-host1 running (6h) 44s ago > >> 12M 156M - 2.18.0 8eb9f2694232 04a0b33e2474 > >> prometheus.ses7-host2 ses7-host2 *:9095 running (104s) 47s ago > >> 6h 250M - 8eb9f2694232 87a5a8349f05 > >> > >> ses7-host1:~ # ceph orch ps --daemon_type grafana > >> NAME HOST PORTS STATUS REFRESHED AGE > >> MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID > >> grafana.ses7-host1 ses7-host1 running (6h) 47s ago 12M > >> 99.6M - 7.1.5 31b52dc794e2 7935ecf47b38 > >> grafana.ses7-host2 ses7-host2 *:3000 running (107s) 49s ago 6h > >> 108M - 7.1.5 31b52dc794e2 17dea034bb33 > >> > >> I just specified two hosts in the placement section of each service > >> and deployed them. I think this should be mentioned in the ceph docs > >> (not only redhat). > >> > >> [1] > >> > >> > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/operations_guide/management-of-monitoring-stack-using-the-ceph-orchestrator > >> > >> Zitat von Michael Lipp <mnl@xxxxxx>: > >> > >> > Hi, > >> > > >> > I've just setup a test cluster with cephadm using quincy. Things > >> > work nicely. However, I'm not sure how to "handle" alertmanager and > >> > prometheus. > >> > > >> > Both services obviously aren't crucial to the working of the > >> > storage, fine. But there seems to be no built-in fall-over concept. > >> > > >> > By default, the active mgr accesses the services using > >> > host.containers.local, thus assuming that they run an the same > >> > machine as the active manager. This assumption is true after the > >> > initial installation. Turning off the host with the active manager > >> > activates the stand-by on another machine, but alertmanager and > >> > prometheus are gone (i.e. not "moved along"). So the active manager > >> > produces lots of error messages when logging into it. Turning the > >> > tuned-off machine on again doesn't help, because alertmanager and > >> > prometheus are back, but on the wrong machine. > >> > > >> > I couldn't find anything in the documentation. Are alertmanager and > >> > prometheus supposed to run in some HA-VM? Then I could add the HA-VM > >> > to the cluster with (only) these two services running on it and make > >> > the URIs point to this HA-VM (ceph dashboard > >> > set-alertmanager-api-host ..., ceph dashboard set-grafana-api-url > >> > ..., ceph dashboard set-prometheus-api-host...). > >> > > >> > How is this supposed to be configured? > >> > > >> > - Michael > >> > > >> > > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx