Re: Monitoring Overhead

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Tue, 25 Oct 2016 22:27:35 -0400

Hi Ashley,

On Monday, October 24, 2016, Ashley Merrick <ashley@xxxxxxxxxxxxxx> wrote:
Hello,

Thanks both for your responses, defiantly looking at collectd + graphite, just wanted to see what overheads where like, far from in a situation that would choke the cluster but wanted to check first.

I run ceph -s with json output, parse that (with e.g. Perl, or you can use Python etc) and store in mysql database. This provides a few snapshots and simple at a glance analysis. Overhead is practically none. 

For OSDs things are trickier, but for simplicity's sake we run iostat for a few cycles and parse that output, then aggregate. 

Collectd and graphite look really nice. 

Regards,
Alex

Thanks,

Ashley

-----Original Message-----

From: Christian Balzer [mailto:chibi@xxxxxxx]

Sent: 24 October 2016 11:04

To: ceph-users@xxxxxxxxxxxxxx

Cc: John Spray <jspray@xxxxxxxxxx>; Ashley Merrick <ashley@xxxxxxxxxxxxxx>

Subject: Re:  Monitoring Overhead

Hello,

On Mon, 24 Oct 2016 10:46:31 +0100 John Spray wrote:

> On Mon, Oct 24, 2016 at 4:21 AM, Ashley Merrick <ashley@xxxxxxxxxxxxxx> wrote:

> > Hello,

> >

> >

> >

> > This may come across as a simple question but just wanted to check.

> >

> >

> >

> > I am looking at importing live data from my cluster via ceph -s

> > e.t.c into a graphical graph interface so I can monitor performance

> > / iops / e.t.c overtime.

> >

> >

> >

> > I am looking to pull this data from one or more monitor nodes, when

> > the data is retrieved for the ceph -s output is this information

> > that the monitor already has locally or is there an overhead that is

> > applied to the whole cluster to retrieve this data every time the command is executed?

>

> It's all from the local state on the mons, the OSDs aren't involved at

> all in responding to the status command.

>

That said, as mentioned before on this ML, the output of "ceph -s" is a sample from a window and only approaching something reality if sampled and divided of a long period.

If you need something that involves "what happened on OSD x at time y", collectd and graphite (or deviations of if) are your friends, but they do cost you a CPU cycle or two.

OTOH, if your OSDs or MONs were to choke from that kind of monitoring, you're walking on very thin ice already.

Christian

> Cheers,

> John

>

> >

> >

> >

> > Reason I ask is I want to make sure I am not applying unnecessary

> > overhead and load onto all OSD node’s to retrieve this data at a

> > near live view, I fully understand it will apply a small amount of

> > load / CPU on the local MON to process the command, I am more interesting in overall cluster.

> >

> >

> >

> > Thanks,

> >

> > Ashley

> >

> >

> > _______________________________________________

> > ceph-users mailing list

> > ceph-users@xxxxxxxxxxxxxx

> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx        Global OnLine Japan/Rakuten Communications

http://www.gol.com/

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
--Alex Gorbachev
Storcium

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com