НА: How to improve latencies and per-VM performance and latencies

Межов Игорь Александрович <megov@xxxxxxxxxx> · Wed, 20 May 2015 08:20:06 +0000

Hi!

1. Use it at your own risk. I'm not responsible to any damage, you can get by running thos script

2. What is it for. 
 Ceph osd daemon have so called 'admin socket' - a local (to osd host) unix socket, that we can
use to issue commant to that osd. The script connects to a list od osd hosts (now it os hardcoded in
source code, but it's easily changeable) by ssh, lists all admin sockets from /var/run/ceph, grep
socket names for osd numbers, and issue 'perf dump' command to all osds. Json output parsed
by standard python libs ans some latency parameters extracted from it. They coded in json as tuples,
containing  total amount of time in milliseconds and count of events. So dividing time to count we get
average latency for one or more ceph operations. The min/max/avg are counted for every host and
whole cluster, and latency of every osd compared to minimal value of cluster (or host) and colorized
to easily detect too high values. 
You can check usage example in comments at the top of the script and change hardcoded values,
that are also gathered at the top.

3. I use script on Ceph Firefly 0.80.7, but think that it will work on any release, that supports
admin socket connection to osd, 'perf dump' command and the same json output structure.

4. As we connects to osd hosts by ssh in a one-by-one, the script is slow, especially when you have
more osd hosts. Also, als osd from a host are output in a one row, so if you have >12 osds per host,
it will mess output slightly.

PS: This is my first python script, so suggestions and improvements are welcome ;)

Megov Igor
CIO, Yuterra

________________________________________
От: Michael Kuriger <mk7193@xxxxxx>
Отправлено: 19 мая 2015 г. 18:51
Кому: Межов Игорь Александрович
Тема: Re:  How to improve latencies and per-VM performance  and latencies

Awesome!  I would be interested in doing this as well.  Care to share how
your script works?

Thanks!

Michael Kuriger
Sr. Unix Systems Engineer
* mk7193@xxxxxx |( 818-649-7235

On 5/19/15, 6:31 AM, "Межов Игорь Александрович" <megov@xxxxxxxxxx> wrote:

>Hi!
>
>Seeking performance improvement in our cluster (Firefly 0.80.7 on Wheezy,
>5 nodes, 58 osds), I wrote
>a small python script, that walks through ceph nodes and issue 'perf
>dump' command on osd admin
>sockets. It extracts *_latency tuples, calculate min/max/avg, compare osd
>perf metrics with min/avg
>of whole cluster or same host and display result in table form. The goal
>- to check where the most latency is.
>
>The hardware is not new and shiny:
> - 5 nodes * 10-12 OSDs each
> - Intel E5520@2.26/32-48Gb DDR3-1066 ECC
> - 10Gbit X520DA interconnect
> - Intel DC3700 200Gb as a system volume + journals, connected to sata2
>onboard in ahci mode
> - Intel RS2MB044 / RS2BL080 SAS RAID in RAID0 per drive mode, WT, disk
>cache disabled
> - bunch of 1Tb or 2Tb various WD Black drives, 58 disks, 76Tb total
> - replication = 3, filestore on xfs
> - shared client and cluster 10Gbit network
> - cluster used as rbd storage for VMs
> - rbd_cache is on by 'cache=writeback' in libvirt (I suppose, that it is
>true ;))
> - no special tuning in ceph.conf:
>
>>osd mount options xfs = rw,noatime,inode64
>>osd disk threads = 2
>>osd op threads = 8
>>osd max backfills = 2
>>osd recovery max active = 2
>
>I get rather slow read performance from within VM, especially with QD=1,
>so many VMs are running slowly.
>I think, that this HW config can perform better, as I got 10-12k iops
>with QD=32 from time to time.
>
>So I have some questions:
> 1. Am I right, that osd perfs are cumulative and counting up from OSD
>start?
> 2. Is any way to reset perf counters without restating OSD daemon? Maybe
>a command through admin socket?
> 3. What latencies should I expect from my config, or, what latencies you
>have on yours clusters?
>Just an example or as a reference to compare with my values. I've
>interesting mostly in
> - 'op_latency',
> - 'op_[r|w]_latency',
> - 'op_[r|w]_process_latency'
> - 'journal_latency'
>But other parameters, like 'apply_latency' or
>'queue_transaction_latency_avg' are also interesting to compare.
> 4. Where I have to look firstly, if I need to improve QD=1 (i. e.
>per-VM) performance.
>
>Thanks!
>
>Megov Igor
>CIO, Yuterra
>_______________________________________________
>ceph-users mailing list
>ceph-users@xxxxxxxxxxxxxx
>https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf
>o.cgi_ceph-2Dusers-2Dceph.com&d=AwICAg&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc
>m6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=c0lu_hzIfU4AXi0gnwLzaOeWo7EFrFwlKjKf
>K-iihGg&s=o-hDZx1--UnZ27K2XL7-w08f2fwTwargpeiWtFS87L0&e=

Attachment:
getosdstat.py.gz

Description: getosdstat.py.gz
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com