Re: Having trouble getting good performance

Somnath Roy <Somnath.Roy@xxxxxxxxxxx> · Wed, 22 Apr 2015 18:54:52 +0000

What ceph version are you using ?
It seems clients are not sending enough traffic to the cluster.
Could you try with rbd_cache=false or true and see if behavior changes ?
What is the client side cpu util ?
Performance also depends on the QD you are driving with.
I would suggest, run fio on top of VM with similar workload but high/low QD to isolate the issue.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of J David
Sent: Wednesday, April 22, 2015 11:42 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject:  Having trouble getting good performance

A very small 3-node Ceph cluster with this OSD tree:

http://pastebin.com/mUhayBk9

has some performance issues.  All 27 OSDs are 5TB SATA drives, it keeps two copies of everything, and it's really only intended for nearline backups of large data objects.

All of the OSDs look OK in terms of utilization.  iostat reports them around 5-10 %util with occasional spikes up to 20%, and doing typically 5-20 IOPs each.  (Which, if we figure 100  IOPs per drive, matches up nicely with the %util.

The Ceph nodes also look health.  CPU is 75-90% idle.  Plenty of RAM (the smaller node has 32GiB, the larger ones have 64GiB.  They have dual 10GBase-T NICs on an unloaded, dedicated storage switch.

The pool has 4096 placement groups.  Currently we have noscrub & nodeep-scrub set to eliminate sctubbing as source of performance problems.

When we increased placement groups from the default to 4096, a ton of data moved and quickly, the cluster is definitely capable of pushing data around at multi-gigabit speeds if it wants to.

Yet this cluster has (currently) one client, a KVM machine backed by a 32TB RBD image.  It is a Linux VM running ZFS that recevies ZFS snapshots from other machines to back them up.  However, the performance is pretty bad.  When it is receiving even one snapshot, iostat -x on the client reports it is choking on I/O to the Ceph rbd
image: 99.88 %util while doing 50-75 IOPs and 5-20 MB/sec of throughput.

Is there anything we can do to improve the performance of this configuration, or at least figure out why it is so bad?  These are large SATA drives with no SSD OSD's, so we don't expect miracles.  But it sure would be nice to see client I/O that was better than 50 IOPs and 20MB/sec.

Thanks in advance for any help or guidance on this!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com