Re: Understanding "ceph -w" output - cluster monitoring

Christian Balzer <chibi@xxxxxxx> · Tue, 15 Mar 2016 07:48:10 +0900

Hello,

On Mon, 14 Mar 2016 09:16:13 -0700 Blade Doyle wrote:

> Hi Ceph Community,
> 
> I am trying to use "ceph -w" output to monitor my ceph cluster.  The
> basic setup is:
> 
> A python script runs ceph -w and processes each line of output.  It finds
> the data it wants and reports it to InfluxDB.  I view the data using
> Grafana, and Ceph Dashboard.
>

A much richer and more precise source of information would be the various
performance counters and using collectd to feed them into graphite and
friends.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/039953.html

I'm using the DWM one, YMMV.

> For the most part its working well.  But Im not clear on exactly how to
> interpret the output of "ceph -w".
> 
> Take read statistics in the following snipit as an example:
> 
> 1) 2016-03-14 09:00:00.783429 mon.0 [INF] HEALTH_OK
> 2) 2016-03-14 09:00:01.004309 mon.0 [INF] pgmap v4110206: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 69153
> B/s wr, 10 op/s
> 3) 2016-03-14 *09:00:02.087584* mon.0 [INF] pgmap v4110207: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; *0 B/s
> rd*, 96928 B/s wr, 17 op/s
> 4) 2016-03-14 *09:00:03.435291* mon.0 [INF] pgmap v4110208: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; *2028
> B/s rd*, 83404 B/s wr, 8 op/s
> 5) 2016-03-14 *09:00:04.499252* mon.0 [INF] pgmap v4110209: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; *3368
> B/s rd*, 57677 B/s wr, 29 op/s
> 
> At *09:00:02 **0 B/s rd*
> At *09:00:03 **2028 B/s rd*
> At *09:00:04 **3368 B/s rd*
> 
> So I can interpret this as "no data was read between 09:00:02 and
> 09:00:03", and "2028 bytes was read between 09:00:03 and 09:00:04"?
>
No, both ceph -w and a "watch ceph -s" (which I like as instant ceph
monitor in a terminal window) have a very impressive output.

Ah, I see John's mail pop up just now, so I finish here. ^o^

Christian

> 2016-03-14 09:00:05.572509 mon.0 [INF] pgmap v4110210: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 140 kB/s
> wr, 33 op/s
> 2016-03-14 09:00:06.715286 mon.0 [INF] pgmap v4110211: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 146 kB/s
> wr, 5 op/s
> 2016-03-14 09:00:07.855350 mon.0 [INF] pgmap v4110212: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 73206
> B/s wr, 4 op/s
> 2016-03-14 09:00:09.111931 mon.0 [INF] pgmap v4110213: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 19729
> B/s wr, 9 op/s
> 2016-03-14 09:00:10.269301 mon.0 [INF] pgmap v4110214: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 63001
> B/s wr, 9 op/s
> 2016-03-14 09:00:12.589068 mon.0 [INF] pgmap v4110215: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 62601
> B/s wr, 2 op/s
> 
> Ok, so at this point the last read stat I got was *09:00:04 **3368 B/s
> rd.* Because I got no new read statistic should I interpret that as
> "*3368 B/s were read each second since *
> *09:00:04"?  Or, as "Starting at *09:00:05 no read stat was reported so
> between 09:00:05 and 09:00:12 0 bytes were read"?
> 
> 2016-03-14 09:00:13.677077 mon.0 [INF] pgmap v4110216: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 4832 B/s
> rd, 33827 B/s wr, 2 op/s
> 2016-03-14 09:00:14.825715 mon.0 [INF] pgmap v4110217: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 7151 B/s
> rd, 111 kB/s wr, 22 op/s
> 
> 
> Thanks much for any light you can shed.
> Blade.

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com