Poor performance with three nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have three storage servers that provide NFS and iSCSI services to my network, which serve data to four virtual machine compute hosts (two ESXi, two libvirt/kvm) with several dozen virtual machines . I decided to test out a Ceph deployment to see whether it could replace iSCSI as the primary way to provide block stores to my virtual machines, since this would allow better redundancy and better distribution of the load across the storage servers.

I used ceph version 0.67.3 from RPM's. Because these are live servers providing NFS and iSCSI data they aren't a clean slate, so the Ceph datastores were created on XFS partitions. Each partition is on a single diskgroup (12-disk RAID6), of which there are two on each server, each connected to its own 3Gbit/sec SAS channel. The servers are all connected together with 10 gigabit Ethernet. The redundancy factor was set to 3 (three copies of each chunk of data) so that a chunk would be guaranteed to reside on at least two servers (since each server has two chunkstores).

My experience with doing streaming writes via NFS or iSCSI to these servers is that the limiting factor is the performance of the SAS bus. That is, on the client side I top out at 240 megabytes per second on writes to a single disk group, a bit higher on reads, due to the 3 gigabit/sec SAS bus. When I am exercising both disk groups at once I am maxing out both SAS buses for double the performance. The 10 gigabit Ethernet w/9000 MTU apparently has plenty of bandwidth to saturate two 3 gigabit SAS buses.

My first test of ceph was to create a 'test1' volume that was around 8 gigabytes in size (or roughly the size of the root partition of one of my virtual machines), then test streaming reads and writes. The test for streaming reads and writes was simple:

[root@stack1 ~]# dd if=/dev/zero of=/dev/rbd/data/test1 bs=524288
dd: error writing ‘/dev/rbd/data/test1’: No space left on device
16193+0 records in
16192+0 records out
8489271296 bytes (8.5 GB) copied, 172.71 s, 49.2 MB/s

[root@stack1 ~]# dd if=/dev/rbd/data/test1 of=/dev/null bs=524288
16192+0 records in
16192+0 records out
8489271296 bytes (8.5 GB) copied, 25.2494 s, 336 MB/s

So:

1) Writes are truly appalling. They are not going at the speed of even a single disk drive (my disk drives are capable of streaming approximately 120 megabytes per second).

2) Reads are more acceptable. I am getting better throughput than with a single SAS channel, as you would expect with reads striped across three SAS channels. Still, reads are slower than I expected given the speed of my infrastructure.

Compared to Amazon EBS, reads appear roughly the same as EBS on an IO-enhanced instance, and writes are *much* slower.

What this seems to indicate is either a) inherent Ceph performance issues for writes, or b) I have something misconfigured. There's simply too much of a mismatch between what the underlying hardware does with NFS and iSCSI, and what it does with Ceph, to consider this to be appropriate performance. My guess is (b), that I have something misconfigured. Any ideas what I should look for?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux