Re: Tool for ceph performance analysis

John Spray <john.spray@xxxxxxxxxx> · Tue, 24 Feb 2015 11:57:08 +0000

On 24/02/2015 08:40, Alyona Kiselyova wrote:
There are similar message from Sage Weil in ceph-devel maillist some
weeks ago. It was about perf-watch.py script, which is available from
ceph repository, but it provides only per-node work too (and works on
vbstart cluster, so to use it on working system some changes must be
done).
There is now a modernized version of perf-watch in a PR: 
https://github.com/ceph/ceph/pull/3615

I posted about it to the list a little while ago but there wasn't any 
interest, so it's still hanging around in a PR (subject was "Performance 
watching (dstat-like) CLI mode")

We are working now on tool, which has similar possibilities, but it
can collect counters either from one node, or from all ceph nodes.
Also tool provide possibility to check system resources usage by ceph
processes.Now it uses ssh, so it doesn't work good, if you have no
password-less access to all nodes.
Cool!  You may also be interested in the calamari branch of diamond:
https://github.com/ceph/Diamond/tree/calamari

This will grab all the perf counters and send them back to a graphite 
server that you can run whatever queries you wish to on.
The first version of this tool is available on github
(https://github.com/Ved-vampir/ceph-perf-tool). May be, after
improvements, this tool will be useful for other people and it can
appear in ceph in some way. It would be cose, if such utility will be
in ceph "out of the box". May be, we can merge it?
There has been discussion in the past about allowing users to run 
arbitrary admin socket operations via the mon, that would at least 
remove the need for a program like yours to do its own SSHing. However, 
regular polling of 1000s of OSDs perf stats via this mechanism could 
quickly have a measurable impact on things.

The other thing that would be very nice to add into the main ceph .py 
code is the general service discovery part where we enumerate which 
services are running on a node and get their admin socket paths: 
currently this is done in both the diamond collector module and in the 
calamari salt module.

It would be great, if there will be internal possibility to collect
info about whole cluster from one node. May be, something like
extension for "tell" command, which can call any node directly and
replace external network connections. Or improved version of "ceph osd
perf" command, which would allow to get more info.

This pretty much already exists if someone chooses to deploy 
diamond+graphite.  Perhaps we need to talk about what's wrong with that 
solution as it stands?  I'm guessing the main problem is that it's less 
highly available than ceph mons, and comparatively heavyweight, 
especially if one is only interested in the latest values.

Cheers,
John

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html