Hi I noticed that cephadm would update the grafana-frontend-api-url with version 17.2.3, but it looks broken with version 17.2.5. It isn't a big deal to update the url by myself, but it's quite irritating to do if in the past it corrected itself. Best regards, Sake ________________________________ From: Eugen Block <eblock@xxxxxx> Sent: Wednesday, November 9, 2022 9:26:28 AM To: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: Re: How to ... alertmanager and prometheus The only thing I noticed was that I had to change the grafana-api-url for the dashboard when I stopped one of the two grafana instances. I wasn't able to test the dashboard before because I had to wait for new certificates so my browser wouldn't complain about the cephadm cert. So it seems as if the failover doesn't work entirely automatic, but it's not too much work to switch the api url. :-) Zitat von Michael Lipp <mnl@xxxxxx>: > Thank you both very much! I have understood things better now. > > I'm not sure, though, whether all URIs are adjusted properly when > changing the placement of the services. Still testing... > > Am 08.11.22 um 17:13 schrieb Redouane Kachach Elhichou: >> Welcome Eugen, >> >> There are some ongoing efforts to make the whole prometheus stack config >> more dynamic by using the http sd configuration [1]. In fact part of the >> changes are already in main but they will not be available till the next >> Ceph official release. >> >> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprometheus.io%2Fdocs%2Fprometheus%2Flatest%2Fconfiguration%2Fconfiguration%2F%23http_sd_config&data=05%7C01%7C%7C6b20dfcbe4864afa800408dac22c2be0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638035792261472298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=G9tOki9%2FzRHSJXMU4BlcaQjtscEkNWKXIG1TGCGR14Y%3D&reserved=0 >> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprometheus.io%2Fdocs%2Fprometheus%2F2.28%2Fconfiguration%2Fconfiguration%2F%23http_sd_config&data=05%7C01%7C%7C6b20dfcbe4864afa800408dac22c2be0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638035792261472298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EKDJJnfTfN5fYN8T6Z2%2Fn7MpgmMysUCI2NT8%2BOX5aic%3D&reserved=0> >> >> >> On Tue, Nov 8, 2022 at 4:47 PM Eugen Block <eblock@xxxxxx> wrote: >> >>> I somehow missed the HA part in [1], thanks for pointing that out. >>> >>> >>> Zitat von Redouane Kachach Elhichou <rkachach@xxxxxxxxxx>: >>> >>>> If you are running quincy and using cephadm then you can have more >>>> instances of prometheus (and other monitoring daemons) running in HA mode >>>> by increasing the number of daemons as in [1]: >>>> >>>> from a cephadm shell (to run 2 instances of prometheus and >>> altertmanager): >>>>> ceph orch apply prometheus --placement 'count:2' >>>>> ceph orch apply alertmanager --placement 'count:2' >>>> You can have as many instances as you need. You can choose on which nodes >>>> to place them by using the daemon placement specification of cephadm [2] >>> by >>>> using a specific label for monitoring i.e. In case of mgr failover >>> cephadm >>>> should reconfigure the daemons accordingly. >>>> >>>> [1] >>>> >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Fquincy%2Fcephadm%2Fservices%2Fmonitoring%2F%23deploying-monitoring-with-cephadm&data=05%7C01%7C%7C6b20dfcbe4864afa800408dac22c2be0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638035792261472298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PUm4%2FZ6%2B19uSureq%2Bn47bGAlfs%2BA9TLrZop%2BR%2F0o5kA%3D&reserved=0 >>>> [2] https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Fquincy%2Fcephadm%2Fservices%2F%23daemon-placement&data=05%7C01%7C%7C6b20dfcbe4864afa800408dac22c2be0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638035792261472298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=I3eWKVLUV8wfxiVyJe1X1NzC0wCNjF%2F13WemeBTEsc0%3D&reserved=0 >>>> >>>> Hope it helps, >>>> Redouane. >>>> >>>> >>>> >>>> >>>> On Tue, Nov 8, 2022 at 3:58 PM Eugen Block <eblock@xxxxxx> wrote: >>>> >>>>> Hi, >>>>> >>>>> the only information I found so far was this statement from the redhat >>>>> docs [1]: >>>>> >>>>>> When multiple services of the same type are deployed, a >>>>>> highly-available setup is deployed. >>>>> I tried to do that in a virtual test environment (16.2.7) and it seems >>>>> to work as expected. >>>>> >>>>> ses7-host1:~ # ceph orch ps --daemon_type prometheus >>>>> NAME HOST PORTS STATUS REFRESHED >>>>> AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID >>>>> prometheus.ses7-host1 ses7-host1 running (6h) 12s ago >>>>> 12M 165M - 2.18.0 8eb9f2694232 04a0b33e2474 >>>>> prometheus.ses7-host2 ses7-host2 *:9095 host is offline 89s ago >>>>> 6h 236M - 8eb9f2694232 0cb070cea4eb >>>>> >>>>> host2 was the active mgr before I shut it down, but I still have >>>>> access to prometheus metrics as well as active alerts from >>>>> alertmanager, there's also one spare instance running, the same >>>>> applies for grafana: >>>>> >>>>> ses7-host1:~ # ceph orch ps --daemon_type alertmanager >>>>> NAME HOST PORTS STATUS >>>>> REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID >>>>> alertmanager.ses7-host1 ses7-host1 running (6h) >>>>> 42s ago 12M 33.7M - 0.16.2 903e9b49157e 5a4ffc9a79da >>>>> alertmanager.ses7-host2 ses7-host2 *:9093,9094 running (102s) >>>>> 44s ago 6h 35.5M - 903e9b49157e 71ac3c636a6b >>>>> >>>>> ses7-host1:~ # ceph orch ps --daemon_type prometheus >>>>> NAME HOST PORTS STATUS REFRESHED >>>>> AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID >>>>> prometheus.ses7-host1 ses7-host1 running (6h) 44s ago >>>>> 12M 156M - 2.18.0 8eb9f2694232 04a0b33e2474 >>>>> prometheus.ses7-host2 ses7-host2 *:9095 running (104s) 47s ago >>>>> 6h 250M - 8eb9f2694232 87a5a8349f05 >>>>> >>>>> ses7-host1:~ # ceph orch ps --daemon_type grafana >>>>> NAME HOST PORTS STATUS REFRESHED AGE >>>>> MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID >>>>> grafana.ses7-host1 ses7-host1 running (6h) 47s ago 12M >>>>> 99.6M - 7.1.5 31b52dc794e2 7935ecf47b38 >>>>> grafana.ses7-host2 ses7-host2 *:3000 running (107s) 49s ago 6h >>>>> 108M - 7.1.5 31b52dc794e2 17dea034bb33 >>>>> >>>>> I just specified two hosts in the placement section of each service >>>>> and deployed them. I think this should be mentioned in the ceph docs >>>>> (not only redhat). >>>>> >>>>> [1] >>>>> >>>>> >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccess.redhat.com%2Fdocumentation%2Fen-us%2Fred_hat_ceph_storage%2F5%2Fhtml%2Foperations_guide%2Fmanagement-of-monitoring-stack-using-the-ceph-orchestrator&data=05%7C01%7C%7C6b20dfcbe4864afa800408dac22c2be0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638035792261472298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iaHlQHADdNwmVZjXeOu0PCocrspSHij0H8gAN2pZcTI%3D&reserved=0 >>>>> Zitat von Michael Lipp <mnl@xxxxxx>: >>>>> >>>>>> Hi, >>>>>> >>>>>> I've just setup a test cluster with cephadm using quincy. Things >>>>>> work nicely. However, I'm not sure how to "handle" alertmanager and >>>>>> prometheus. >>>>>> >>>>>> Both services obviously aren't crucial to the working of the >>>>>> storage, fine. But there seems to be no built-in fall-over concept. >>>>>> >>>>>> By default, the active mgr accesses the services using >>>>>> host.containers.local, thus assuming that they run an the same >>>>>> machine as the active manager. This assumption is true after the >>>>>> initial installation. Turning off the host with the active manager >>>>>> activates the stand-by on another machine, but alertmanager and >>>>>> prometheus are gone (i.e. not "moved along"). So the active manager >>>>>> produces lots of error messages when logging into it. Turning the >>>>>> tuned-off machine on again doesn't help, because alertmanager and >>>>>> prometheus are back, but on the wrong machine. >>>>>> >>>>>> I couldn't find anything in the documentation. Are alertmanager and >>>>>> prometheus supposed to run in some HA-VM? Then I could add the HA-VM >>>>>> to the cluster with (only) these two services running on it and make >>>>>> the URIs point to this HA-VM (ceph dashboard >>>>>> set-alertmanager-api-host ..., ceph dashboard set-grafana-api-url >>>>>> ..., ceph dashboard set-prometheus-api-host...). >>>>>> >>>>>> How is this supposed to be configured? >>>>>> >>>>>> - Michael >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>> >>> >>> >>> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx