On Sat, 8 Mar 2014 13:43:37 +0800, Indra Pramana <indra@xxxxxxxx> wrote: > Hi Mariusz, > > Good day to you, and thank you for your email. > > >You should probably start by hooking up all servers into some kind of > statistics > >gathering software (we use collectd + graphite ) and monitor at least > disk stats > >(latency + iops + octets) and network. > > Thank you for your recommendation on collectd + graphite. I have checked > and they just do the collection of the data and graph it, but what is the > tools to gather the data, especially the disk stats latency and iops? What > tools are recommended? I used iostat but it doesn't seem to give much > information. What parameters I need to lookout to check the latency and > iops? I use collectd mostly because it have tons of plugins, including disk one that gathers IOPS, transfer speed, latency and queue size on all block devices. It have few different output modules, you can use it for generating "classical" RRDTool files, but we use graphite because it is a bit better at analyzing data, for example I've plotted "top 6 slowest disks in server" ( http://imgur.com/Q6W7lDr ) based on io latency In general, when you "run out of IOPS" on disk it's latency spikes up as more requests sit in queue. But if difference between same type of disks is huge and load on them is similiar it might mean disk is dying, we had disks that return perfectly good data, but latency occasionaly spiked up to seconds, making whole server lag (it was RAID6 on backup server) -- Mariusz Gronczewski, Administrator efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczewski@xxxxxxxxxxxx <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com