On Fri, Jun 29, 2018 at 2:11 PM Mykola Golub <to.my.trociny@xxxxxxxxx> wrote: > > Hi, > > One of the user asked features is a possibility to collect per RBD > image io stats. > > A natural way would be to send this to ceph-mgr so it could further be > processed/displayed by dashboard or forwarded to (or polled by) an > external system. For this it looks like the librbd client just needs > to register a service in the mgr. > > The main concern here is though that the mgr service wouldn't > currently scale to (tens of) thousands of RBD image services being > registered. > > So the questions are: > > Can we consider this approach as a right thing to do and the potential > ceph-mgr scalability issue is going (needs) to be fixed eventually > (e.g. when we start seeing clusters with tens of thousands osds), so > we could implement this on librbd side, disabled by default, and user > could already start to use this at least for small setups or enabling > it only for a subset of images? My preferred approach is to gather stats on the server side, and do it in a way that is flexible enough to work for CephFS as well. There's a sketch of a design here: https://tracker.ceph.com/projects/ceph/wiki/Live_Performance_Probes, which was written some time ago but never coded. The server side approach is more complex than client side instrumentation, but gives us a richer set of functionality (ability to break stats down however we want at runtime, not only per-client). > Or should we think that ceph-mgr is not designed for such a sort of > things and consider other solutions? The current use of ceph-mgr for gathering stats from server daemons is an opportunistic thing: because we already have a connection for command and control, it's efficient to piggy-back some stats on there too. It's probably not so worthwhile to try and use ceph-mgr for things like clients where it's only statistics. For the simpler client side approach where we're just gathering a fixed set of stats from a large number of endpoints, my view is that prometheus is the way to go. I think some people might even already be doing this with their clients. Cheers, John > As some other solution we could provide built-in/plug-in > prometheus/influx/etc exporters within librados so that it can be > directly configured to export the perf counters to its time series db. > > -- > Mykola Golub > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html