Re: Tool for ceph performance analysis

Mark Nelson <mnelson@xxxxxxxxxx> · Tue, 24 Feb 2015 19:48:38 -0600

On 02/24/2015 07:45 PM, Brad Hubbard wrote:
On 02/25/2015 12:14 AM, Mark Nelson wrote:
On 02/24/2015 06:16 AM, John Spray wrote:

On 24/02/2015 11:57, John Spray wrote:
It would be great, if there will be internal possibility to collect
info about whole cluster from one node. May be, something like
extension for "tell" command, which can call any node directly and
replace external network connections. Or improved version of "ceph osd
perf" command, which would allow to get more info.

This pretty much already exists if someone chooses to deploy
diamond+graphite.  Perhaps we need to talk about what's wrong with
that solution as it stands?  I'm guessing the main problem is that
it's less highly available than ceph mons, and comparatively
heavyweight, especially if one is only interested in the latest values.
Ah, I also forgot to mention: it is not very hard to make a cut-down
version of calamari that doesn't require lots of heavyweight
dependencies.  I started building this a while back before switching
tasks, but there's an old branch here:
https://github.com/ceph/calamari/commits/wip-lite

The key things there are that it doesn't require a postgres database,
and the remote-execution is abstracted into a "Remote" interface so that
you can implement alternatives to salt (e.g. SSH, or run locally on
mon).  It's all free software so borrow what you wish ;-)  The point is
that it isn't necessary to start from scratch in order to get something
lightweight.

My personal vote is to try to get ourselves well integrated into a
good cross section of the existing tools that already do this kind of
thing (zabbix, collectd, collectl, etc)

...and PCP (Performance Co-Pilot) which I have begun work on.

Indeed!  I think this just goes to show that there's not going to be one 
set way that people do this.  We need to appeal to a broad coalition of 
folks.

I'm slightly guilty of rolling my own too since in cbt I gather up
some of our daemon socket output from all the hosts via ssh and just
dump it in the output directory.  There's tons of other systems out
there that do this kind of thing way better though.  I don't want to
discourage anyone from making a new tool if that's their preference,
but I think a lot of folks would benefit if they could just keep using
their existing monitoring tools.

Perhaps part of this might be to just try to get a better idea of
which tools folks are using to do performance monitoring on their
existing clusters (ceph or otherwise).  I've heard zabbix come up
quite a bit recently.

Mark

Cheers,
John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html