Re: Send rbd image io stats to ceph-mgr/external system

John Spray <jspray@xxxxxxxxxx> · Fri, 29 Jun 2018 19:17:49 +0100

On Fri, Jun 29, 2018 at 2:11 PM Mykola Golub <to.my.trociny@xxxxxxxxx> wrote:
>
> Hi,
>
> One of the user asked features is a possibility to collect per RBD
> image io stats.
>
> A natural way would be to send this to ceph-mgr so it could further be
> processed/displayed by dashboard or forwarded to (or polled by) an
> external system. For this it looks like the librbd client just needs
> to register a service in the mgr.
>
> The main concern here is though that the mgr service wouldn't
> currently scale to (tens of) thousands of RBD image services being
> registered.
>
> So the questions are:
>
> Can we consider this approach as a right thing to do and the potential
> ceph-mgr scalability issue is going (needs) to be fixed eventually
> (e.g. when we start seeing clusters with tens of thousands osds), so
> we could implement this on librbd side, disabled by default, and user
> could already start to use this at least for small setups or enabling
> it only for a subset of images?

My preferred approach is to gather stats on the server side, and do it
in a way that is flexible enough to work for CephFS as well.  There's
a sketch of a design here:
https://tracker.ceph.com/projects/ceph/wiki/Live_Performance_Probes,
which was written some time ago but never coded.

The server side approach is more complex than client side
instrumentation, but gives us a richer set of functionality (ability
to break stats down however we want at runtime, not only per-client).

> Or should we think that ceph-mgr is not designed for such a sort of
> things and consider other solutions?

The current use of ceph-mgr for gathering stats from server daemons is
an opportunistic thing: because we already have a connection for
command and control, it's efficient to piggy-back some stats on there
too.  It's probably not so worthwhile to try and use ceph-mgr for
things like clients where it's only statistics.

For the simpler client side approach where we're just gathering a
fixed set of stats from a large number of endpoints, my view is that
prometheus is the way to go.  I think some people might even already
be doing this with their clients.

Cheers,
John

> As some other solution we could provide built-in/plug-in
> prometheus/influx/etc exporters within librados so that it can be
> directly configured to export the perf counters to its time series db.
>
> --
> Mykola Golub
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html