Adrian, Yes, it is single OSD oriented. Like Haomai, we monitor perf dumps from individual OSD admin sockets. On new enough versions of ceph, you can do 'ceph daemon osd.x perf dump', which is a shorter way to ask for the same output as 'ceph --admin-daemon /var/run/ceph/ceph-osd.x.asok perf dump'. Keep in mind, either version has to be run locally on the host where osd.x is running. We use Sensu to take samples and push them to Graphite. We have the ability to then build dashboards showing the whole cluster, units in our CRUSH tree, hosts, or an individual OSDs. I have found that monitoring each OSD's admin daemon is critical. Often times a single OSD can affect performance of the entire cluster. Without individual data, these types of issues can be quite difficult to pinpoint. Also, note that Inktank has developed Calamari. There are rumors that it may be open sourced at some point in the future. Cheers, Mike Dawson On 5/13/2014 12:33 PM, Adrian Banasiak wrote: > Thanks for sugestion with admin daemon but it looks like single osd > oriented. I have used perf dump on mon socket and it output some > interesting data in case of monitoring whole cluster: > { "cluster": { "num_mon": 4, > "num_mon_quorum": 4, > "num_osd": 29, > "num_osd_up": 29, > "num_osd_in": 29, > "osd_epoch": 1872, > "osd_kb": 20218112516, > "osd_kb_used": 5022202696, > "osd_kb_avail": 15195909820, > "num_pool": 4, > "num_pg": 3500, > "num_pg_active_clean": 3500, > "num_pg_active": 3500, > "num_pg_peering": 0, > "num_object": 400746, > "num_object_degraded": 0, > "num_object_unfound": 0, > "num_bytes": 1678788329609, > "num_mds_up": 0, > "num_mds_in": 0, > "num_mds_failed": 0, > "mds_epoch": 1}, > > Unfortunately cluster wide IO statistics are still missing. > > > 2014-05-13 17:17 GMT+02:00 Haomai Wang <haomaiwang at gmail.com > <mailto:haomaiwang at gmail.com>>: > > Not sure your demand. > > I use "ceph --admin-daemon /var/run/ceph/ceph-osd.x.asok perf dump" to > get the monitor infos. And the result can be parsed by simplejson > easily via python. > > On Tue, May 13, 2014 at 10:56 PM, Adrian Banasiak > <adrian at banasiak.it <mailto:adrian at banasiak.it>> wrote: > > Hi, i am working with test Ceph cluster and now I want to > implement Zabbix > > monitoring with items such as: > > > > - whoe cluster IO (for example ceph -s -> recovery io 143 MB/s, 35 > > objects/s) > > - pg statistics > > > > I would like to create single script in python to retrive values > using rados > > python module, but there are only few informations in > documentation about > > module usage. I've created single function which calculates all pools > > current read/write statistics but i cant find out how to add > recovery IO > > usage and pg statistics: > > > > read = 0 > > write = 0 > > for pool in conn.list_pools(): > > io = conn.open_ioctx(pool) > > stats[pool] = io.get_stats() > > read+=int(stats[pool]['num_rd']) > > write+=int(stats[pool]['num_wr']) > > > > Could someone share his knowledge about rados module for > retriving ceph > > statistics? > > > > BTW Ceph is awesome! > > > > -- > > Best regards, Adrian Banasiak > > email: adrian at banasiak.it <mailto:adrian at banasiak.it> > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > Best Regards, > > Wheat > > > > > -- > Pozdrawiam, Adrian Banasiak > email: adrian at banasiak.it <mailto:adrian at banasiak.it> > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >