I have three storage servers that provide NFS and iSCSI services to my
network, which serve data to four virtual machine compute hosts (two
ESXi, two libvirt/kvm) with several dozen virtual machines . I decided
to test out a Ceph deployment to see whether it could replace iSCSI as
the primary way to provide block stores to my virtual machines, since
this would allow better redundancy and better distribution of the load
across the storage servers.
I used ceph version 0.67.3 from RPM's. Because these are live servers
providing NFS and iSCSI data they aren't a clean slate, so the Ceph
datastores were created on XFS partitions. Each partition is on a single
diskgroup (12-disk RAID6), of which there are two on each server, each
connected to its own 3Gbit/sec SAS channel. The servers are all
connected together with 10 gigabit Ethernet. The redundancy factor was
set to 3 (three copies of each chunk of data) so that a chunk would be
guaranteed to reside on at least two servers (since each server has two
chunkstores).
My experience with doing streaming writes via NFS or iSCSI to these
servers is that the limiting factor is the performance of the SAS bus.
That is, on the client side I top out at 240 megabytes per second on
writes to a single disk group, a bit higher on reads, due to the 3
gigabit/sec SAS bus. When I am exercising both disk groups at once I am
maxing out both SAS buses for double the performance. The 10 gigabit
Ethernet w/9000 MTU apparently has plenty of bandwidth to saturate two 3
gigabit SAS buses.
My first test of ceph was to create a 'test1' volume that was around 8
gigabytes in size (or roughly the size of the root partition of one of
my virtual machines), then test streaming reads and writes. The test for
streaming reads and writes was simple:
[root@stack1 ~]# dd if=/dev/zero of=/dev/rbd/data/test1 bs=524288
dd: error writing ‘/dev/rbd/data/test1’: No space left on device
16193+0 records in
16192+0 records out
8489271296 bytes (8.5 GB) copied, 172.71 s, 49.2 MB/s
[root@stack1 ~]# dd if=/dev/rbd/data/test1 of=/dev/null bs=524288
16192+0 records in
16192+0 records out
8489271296 bytes (8.5 GB) copied, 25.2494 s, 336 MB/s
So:
1) Writes are truly appalling. They are not going at the speed of even a
single disk drive (my disk drives are capable of streaming approximately
120 megabytes per second).
2) Reads are more acceptable. I am getting better throughput than with a
single SAS channel, as you would expect with reads striped across three
SAS channels. Still, reads are slower than I expected given the speed of
my infrastructure.
Compared to Amazon EBS, reads appear roughly the same as EBS on an
IO-enhanced instance, and writes are *much* slower.
What this seems to indicate is either a) inherent Ceph performance
issues for writes, or b) I have something misconfigured. There's simply
too much of a mismatch between what the underlying hardware does with
NFS and iSCSI, and what it does with Ceph, to consider this to be
appropriate performance. My guess is (b), that I have something
misconfigured. Any ideas what I should look for?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com