Monitoring ceph statistics using rados python module

clewis@xxxxxxxxxxxxxxxxxx (Craig Lewis) · Tue, 13 May 2014 16:46:42 -0700

On 5/13/14 09:33 , Adrian Banasiak wrote:
> Thanks for sugestion with admin daemon but it looks like single osd 
> oriented. I have used perf dump on mon socket and it output some 
> interesting data in case of monitoring whole cluster:
> { "cluster": { "num_mon": 4,
>       "num_mon_quorum": 4,
>       "num_osd": 29,
>       "num_osd_up": 29,
>       "num_osd_in": 29,
>       "osd_epoch": 1872,
>       "osd_kb": 20218112516,
>       "osd_kb_used": 5022202696,
>       "osd_kb_avail": 15195909820,
>       "num_pool": 4,
>       "num_pg": 3500,
>       "num_pg_active_clean": 3500,
>       "num_pg_active": 3500,
>       "num_pg_peering": 0,
>       "num_object": 400746,
>       "num_object_degraded": 0,
>       "num_object_unfound": 0,
>       "num_bytes": 1678788329609,
>       "num_mds_up": 0,
>       "num_mds_in": 0,
>       "num_mds_failed": 0,
>       "mds_epoch": 1},
>
> Unfortunately cluster wide IO statistics are still missing.
>

I'm getting cluster wide OPs and Bandwidth from ceph pg stat -f json.  
I'm using this section:
{
   "pg_stats_delta": {
"stat_sum": {
"num_bytes": 0,
           "num_objects": 31851793,
           "num_object_clones": 0,
           "num_object_copies": 100208267,
           "num_objects_missing_on_primary": 0,
           "num_objects_degraded": 4687903,
           "num_objects_unfound": 0,
           "num_read": 315072058,
           "num_read_kb": 55549447422,
           "num_write": 223701235,
           "num_write_kb": 20457441876,
           "num_scrub_errors": 0,
           "num_shallow_scrub_errors": 0,
           "num_deep_scrub_errors": 0,
           "num_objects_recovered": 74138172,
           "num_bytes_recovered": 62776621391330,
           "num_keys_recovered": 1129447173},
       "stat_cat_sum": {},
       "log_size": 7191821,
       "ondisk_log_size": 7191821},

I'm tracking num_write, num_write_kb, num_read, and num_read_kb. 
Although I see some other things that I should be tracking too....

Those values appear to be counters, so you probably want to track the 
change from the previous sample rather than the absolute value.

-- 

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/>  | Twitter 
<http://www.twitter.com/centraldesktop>  | Facebook 
<http://www.facebook.com/CentralDesktop>  | LinkedIn 
<http://www.linkedin.com/groups?gid=147417>  | Blog 
<http://cdblog.centraldesktop.com/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140513/27d886d3/attachment.htm>