Ceph Performance Questions with rbd images access by qemu-kvm

Kenneth Van Alstyne <kvanalstyne@xxxxxxxxxxxxxxx> · Mon, 31 Aug 2015 08:31:57 -0500

Sorry about the repost from the cbt list, but it was suggested I post here as well:
I am attempting to track down some performance issues in a Ceph cluster recently deployed.  Our configuration is as follows:
	3 storage nodes, each with:
		- 8 Cores
		- 64GB of RAM
		- 2x 1TB 7200 RPM Spindle
		- 1x 120GB Intel SSD
		- 2x 10GBit NICs (In LACP Port-channel)

The OSD pool min_size is set to “1” and “size” is set to “3”.  When creating a new pool and running RADOS benchmarks, performance isn’t bad — about what I would expect from this hardware configuration:

WRITES:
Total writes made:      207
Write size:             4194304
Bandwidth (MB/sec):     80.017 

Stddev Bandwidth:       34.9212
Max bandwidth (MB/sec): 120
Min bandwidth (MB/sec): 0
Average Latency:        0.797667
Stddev Latency:         0.313188
Max latency:            1.72237
Min latency:            0.253286

RAND READS:
Total time run:        10.127990
Total reads made:     1263
Read size:            4194304
Bandwidth (MB/sec):    498.816 

Average Latency:       0.127821
Max latency:           0.464181
Min latency:           0.0220425

This all looks fine, until we try to use the cluster for its purpose, which is to house images for qemu-kvm, which are access using librbd.  I/O inside VMs have excessive I/O wait times (in the hundreds of ms at times, making some operating systems, like Windows unusable) and throughput struggles to exceed 10MB/s (or less).  Looking at ceph health, we see very low op/s numbers as well as throughput and the requests blocked number seems very high.  Any ideas as to what to look at here?

     health HEALTH_WARN
            8 requests are blocked > 32 sec
     monmap e3: 3 mons at {storage-1=10.0.0.1:6789/0,storage-2=10.0.0.2:6789/0,storage-3=10.0.0.3:6789/0}
            election epoch 128, quorum 0,1,2 storage-1,storage-2,storage-3
     osdmap e69615: 6 osds: 6 up, 6 in
      pgmap v3148541: 224 pgs, 1 pools, 819 GB data, 227 kobjects
            2726 GB used, 2844 GB / 5571 GB avail
                 224 active+clean
  client io 3957 B/s rd, 3494 kB/s wr, 30 op/s

Of note, on the other list, I was asked to provide the following:
	- ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
	- The SSD is split into 8GB partitions. These 8GB partitions are used as journal devices, specified in /etc/ceph/ceph.conf.  For example:
		[osd.0]
		host = storage-1
		osd journal = /dev/mapper/INTEL_SSDSC2BB120G4_CVWL4363006R120LGNp1
	- rbd_cache is enabled and qemu cache is set to “writeback"
	- rbd_concurrent_management_ops is unset, so it appears the default is “10”

Thanks,

--
Kenneth Van Alstyne
Systems ArchitectKnight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com 
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 20000 / ISO 27001

Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, copy, use, disclosure, or distribution is STRICTLY prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com