Re: Understanding "ceph -w" output - cluster monitoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 14, 2016 at 4:16 PM, Blade Doyle <blade.doyle@xxxxxxxxx> wrote:
> Hi Ceph Community,
>
> I am trying to use "ceph -w" output to monitor my ceph cluster.  The basic
> setup is:
>
> A python script runs ceph -w and processes each line of output.  It finds
> the data it wants and reports it to InfluxDB.  I view the data using
> Grafana, and Ceph Dashboard.
>
> For the most part its working well.  But Im not clear on exactly how to
> interpret the output of "ceph -w".
>
> Take read statistics in the following snipit as an example:
>
> 1) 2016-03-14 09:00:00.783429 mon.0 [INF] HEALTH_OK
> 2) 2016-03-14 09:00:01.004309 mon.0 [INF] pgmap v4110206: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 69153 B/s
> wr, 10 op/s
> 3) 2016-03-14 09:00:02.087584 mon.0 [INF] pgmap v4110207: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 0 B/s rd,
> 96928 B/s wr, 17 op/s
> 4) 2016-03-14 09:00:03.435291 mon.0 [INF] pgmap v4110208: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 2028 B/s
> rd, 83404 B/s wr, 8 op/s
> 5) 2016-03-14 09:00:04.499252 mon.0 [INF] pgmap v4110209: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 3368 B/s
> rd, 57677 B/s wr, 29 op/s
>
> At 09:00:02 0 B/s rd
> At 09:00:03 2028 B/s rd
> At 09:00:04 3368 B/s rd
>
> So I can interpret this as "no data was read between 09:00:02 and 09:00:03",
> and "2028 bytes was read between 09:00:03 and 09:00:04"?

Nope.  The number that comes out here is for human consumption, you
can't accurately interpret it like that.  Some things to note:
 * it's already smoothed across the last 2 PGMap updates.
 * it isn't going to come out at particularly regular intervals, it's
coming out at 1s (configurable) + the time it took to save an update
to the PGMap.
 * OSDs only send their stats in to the mon at
osd_mon_report_interval_min (default 5s), so if you're trying to
extract something at higher resolution here it's going to not really
make sense.

You should do your own sampling in a way that makes sense for you.
There is an existing piece of code for collecting pool stats from the
mon here: https://github.com/ceph/Diamond/blob/calamari/src/collectors/ceph/ceph.py#L386

Cheers,
John



> 2016-03-14 09:00:05.572509 mon.0 [INF] pgmap v4110210: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 140 kB/s
> wr, 33 op/s
> 2016-03-14 09:00:06.715286 mon.0 [INF] pgmap v4110211: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 146 kB/s
> wr, 5 op/s
> 2016-03-14 09:00:07.855350 mon.0 [INF] pgmap v4110212: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 73206 B/s
> wr, 4 op/s
> 2016-03-14 09:00:09.111931 mon.0 [INF] pgmap v4110213: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 19729 B/s
> wr, 9 op/s
> 2016-03-14 09:00:10.269301 mon.0 [INF] pgmap v4110214: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 63001 B/s
> wr, 9 op/s
> 2016-03-14 09:00:12.589068 mon.0 [INF] pgmap v4110215: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 62601 B/s
> wr, 2 op/s
>
> Ok, so at this point the last read stat I got was 09:00:04 3368 B/s rd.
> Because I got no new read statistic should I interpret that as "3368 B/s
> were read each second since
> 09:00:04"?  Or, as "Starting at 09:00:05 no read stat was reported so
> between 09:00:05 and 09:00:12 0 bytes were read"?
>
> 2016-03-14 09:00:13.677077 mon.0 [INF] pgmap v4110216: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 4832 B/s
> rd, 33827 B/s wr, 2 op/s
> 2016-03-14 09:00:14.825715 mon.0 [INF] pgmap v4110217: 920 pgs: 920
> active+clean; 427 GB data, 917 GB used, 1413 GB / 2456 GB avail; 7151 B/s
> rd, 111 kB/s wr, 22 op/s
>
>
> Thanks much for any light you can shed.
> Blade.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux