Re: Tool for ceph performance analysis

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24/02/2015 08:40, Alyona Kiselyova wrote:
There are similar message from Sage Weil in ceph-devel maillist some
weeks ago. It was about perf-watch.py script, which is available from
ceph repository, but it provides only per-node work too (and works on
vbstart cluster, so to use it on working system some changes must be
done).
There is now a modernized version of perf-watch in a PR: https://github.com/ceph/ceph/pull/3615

I posted about it to the list a little while ago but there wasn't any interest, so it's still hanging around in a PR (subject was "Performance watching (dstat-like) CLI mode")

We are working now on tool, which has similar possibilities, but it
can collect counters either from one node, or from all ceph nodes.
Also tool provide possibility to check system resources usage by ceph
processes.Now it uses ssh, so it doesn't work good, if you have no
password-less access to all nodes.
Cool!  You may also be interested in the calamari branch of diamond:
https://github.com/ceph/Diamond/tree/calamari

This will grab all the perf counters and send them back to a graphite server that you can run whatever queries you wish to on.
The first version of this tool is available on github
(https://github.com/Ved-vampir/ceph-perf-tool). May be, after
improvements, this tool will be useful for other people and it can
appear in ceph in some way. It would be cose, if such utility will be
in ceph "out of the box". May be, we can merge it?
There has been discussion in the past about allowing users to run arbitrary admin socket operations via the mon, that would at least remove the need for a program like yours to do its own SSHing. However, regular polling of 1000s of OSDs perf stats via this mechanism could quickly have a measurable impact on things.

The other thing that would be very nice to add into the main ceph .py code is the general service discovery part where we enumerate which services are running on a node and get their admin socket paths: currently this is done in both the diamond collector module and in the calamari salt module.

It would be great, if there will be internal possibility to collect
info about whole cluster from one node. May be, something like
extension for "tell" command, which can call any node directly and
replace external network connections. Or improved version of "ceph osd
perf" command, which would allow to get more info.

This pretty much already exists if someone chooses to deploy diamond+graphite. Perhaps we need to talk about what's wrong with that solution as it stands? I'm guessing the main problem is that it's less highly available than ceph mons, and comparatively heavyweight, especially if one is only interested in the latest values.

Cheers,
John


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux