Re: Discovery (port 8765) service not starting

Redouane Kachach <rkachach@xxxxxxxxxx> · Fri, 6 Sep 2024 09:08:18 +0200

Hi Matthew,

That makes sense. The ipv6 BUG can lead to the issue you described. In the
current implementation whenever a mgr failover takes place, prometheus
configuration (when using the monitoring stack deployed by Ceph) is updated
automatically to point to the new active mgr. Unfortunately it's not easy
to have active services running in the standby mgr. At most, we can do some
redirection as we do in the dashboard. So far we haven't had the need to do
that. Next releases will come with the new mgmt-gateway service introduced
in [1] and [2] which will make it easy to have a single entry point to the
cluster handling HA transparently in the backend. This is still WIP but you
can play with it if you want using the latest code from main. Support for
OIDC based on oauth2-proxy is also being introduced as part of this effort
by [3].

@ Timo Holloway, as I said the support [4] for service discovery has been
there for a while (I'd say 2 years aprox) unless you are using an old Ceph
version (where the prometheus config was static) you should see traffic in
the port 8765.

[1] https://github.com/ceph/ceph/pull/57535
[2] https://github.com/ceph/ceph/pull/58402
[3] https://github.com/ceph/ceph/pull/58460
[4] https://github.com/ceph/ceph/pull/46400

On Thu, Sep 5, 2024 at 7:00 PM Tim Holloway <timh@xxxxxxxxxxxxx> wrote:

> Now you've got me worried. As I said, there is absolutely no traffic
> using port 8765 on my LAN.
>
> Am I missing a service? Since my distro is based on stock Prometheus,
> I'd have to assume that the port 8765 server would be part of the Ceph
> generic container image and isn't being switched on for some reason.
>
>    Tim
>
> On Thu, 2024-09-05 at 15:05 +0100, Matthew Vernon wrote:
> > On 05/09/2024 15:03, Matthew Vernon wrote:
> > > Hi,
> > >
> > > On 05/09/2024 12:49, Redouane Kachach wrote:
> > >
> > > > The port 8765 is the "service discovery" (an internal server that
> > > > runs in
> > > > the mgr... you can change the port by changing the
> > > > variable service_discovery_port of cephadm). Normally it is
> > > > opened in the
> > > > active mgr and the service is used by prometheus (server) to get
> > > > the
> > > > targets by using the http service discovery feature [1]. This
> > > > feature has
> > > > been there for a long time now and it's the default configuration
> > > > used by
> > > > Ceph monitoring stack. It should start automatically without any
> > > > external
> > > > intervention (or manual configuration).
> > >
> > > Right; it wasn't running because I have an IPv6 deployment (that
> > > bug's
> > > fixed in 18.2.4 - https://tracker.ceph.com/issues/63448).
> >
> > ...though I'm not sure that having only the active mgr run this
> > endpoint
> > is correct, though? Isn't it more useful to be able to e.g. point my
> > Prometheus at any of the mgrs and have service discovery work, rather
> > than needing Prometheus to know which mgr is active to know which sd
> > to
> > talk to, which seems to rather defeat the point?
> >
> > Thanks,
> >
> > Matthew
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx