Re: Fluctuating I/O speed degrading over time

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Sat, 08 Mar 2014 06:56:40 -0600

On 03/07/2014 11:43 PM, Indra Pramana wrote:
Hi Mariusz,

Good day to you, and thank you for your email.

 >You should probably start by hooking up all servers into  some kind of
statistics
 >gathering software (we use collectd + graphite ) and monitor  at least
disk stats
 >(latency + iops + octets) and network.

Thank you for your recommendation on collectd + graphite. I have checked
and they just do the collection of the data and graph it, but what is
the tools to gather the data, especially the disk stats latency and
iops? What tools are recommended? I used iostat but it doesn't seem to
give much information. What parameters I need to lookout to check the
latency and iops?

If you are just looking for a simple command line tool, collectl is 
pretty useful.  I use it for all of my automated testing scripts.

apt-get install collectl
collectl -sD -oT

You can get quite a few other statistics from it as well including per 
process statistics, slab memory, cpu, network, etc.

Mark

Looking forward to your reply, thank you.

Cheers.

On Sat, Mar 8, 2014 at 1:04 AM, Mariusz Gronczewski
<mariusz.gronczewski@xxxxxxxxxxxxx
<mailto:mariusz.gronczewski@xxxxxxxxxxxxx>> wrote:

    On Fri, 7 Mar 2014 17:50:44 +0800, Indra Pramana <indra@xxxxxxxx
    <mailto:indra@xxxxxxxx>> wrote:
     >
     > Any advice on how can I start to troubleshoot what might have
    caused the
     > degradation of the I/O speed? Does utilisation contributes to it
    (since now
     > we have more users compared to last time when we started)? Any
    optimisation
     > we can do to improve the I/O performance?

    You should probably start by hooking up all servers into  some kind
    of statistics
    gathering software (we use collectd + graphite ) and monitor  at
    least disk stats
    (latency + iops + octets) and network.

    Then it is much easier to see potential problems, for example we found
      failing-but-not-yet-dead disks that sorta kinda worked but their
    latency was 10x
    higher than all other disks in machine.

    Mariusz Gronczewski, Administrator

    efigence S. A.
    ul. Wołoska 9a, 02-583 Warszawa
    T: [+48] 22 380 13 13 <tel:%5B%2B48%5D%2022%20380%2013%2013>
    F: [+48] 22 380 13 14 <tel:%5B%2B48%5D%2022%20380%2013%2014>
    E: mariusz.gronczewski@xxxxxxxxxxxx
    <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
    <mailto:mariusz.gronczewski@xxxxxxxxxxxx
    <mailto:mariusz.gronczewski@xxxxxxxxxxxx>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com