Hi there,
I am new to Ceph and still learning its performance capabilities, but I would like to share my performance results in the hope that they are useful to others, and also to see if there is room for improvement in my setup.
Firstly, a little about my setup:
3 servers (quad-core CPU, 16GB RAM), each with 4 SATA 7.2K RPM disks (4TB) plus a 160GB SSD.
I have mapped a 10GB volume to a 4th server which is acting as a ceph client. Due to Ceph's thin-provisioning, I used "dd" to write to the entire block device to ensure that the ceph volume is fully allocated. DD writes sequentially at around 95MB/sec which shows the network can run at full capacity.
Each device is connected by a single 1gbps ethernet link to a switch.
I then used "fio" to benchmark the raw block device. The reason for this is that I also need to compare ceph against a traditional iscsi SAN and the internal "rados bench" tools cannot be used for this.
The replication level for the pool I am testing against is 2.
I have tried two setups with regards to the OSD's - firstly with the journal running on a partition on the SSD, and secondly by using "bcache" (http://bcache.evilpiepirate.org) to provide a write-back cache of the 4TB drives.
In all tests, fio was configured to do direct I/O with 256 parallel I/O's.
With the journal on the SSD:
4k random read, around 1200 iops/second, 5mbps.
4k random write, around 300 iops/second, 1.2 mbps.
Using BCache for each OSD (journal is just a file on the OSD):
4k random read, around 2200 iops/second, 9mbps.
4k random write, around 300 iops/second, 1.2 mbps.
By comparison, a 12-disk RAID5 iscsi SAN is doing ~4000 read iops and ~2000 iops write (but with 15KRPM SAS disks).
What is interesting is that bcache definitely has a positive effect on the read IOPS, but something else is being a bottle-neck for the writes.
It looks to me like I have missed something in the configuration which brings down the write IOPS - since 300 iops/second is very poor. If, however, I turn off Direct I/O in the fio tests the performance jumps to around 4000 iops/second. It makes no difference to the read performance which is to be expected.
I have tried increasing the number of threads in each OSD but that has made no difference.
I have also tried images with different (smaller) stripe sizes (--order) instead of the default 4MB but it doesnt make any difference.
Do these figures look reasonable to others? What kind of IOPS should I be expecting?
Additional info is below:
Ceph 0.72.2 running on Centos 6.5 (with custom 3.10.25 kernel for bcache support)
3 servers of the following spec:
CPU: Quad Core Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
RAM: 16GB
Disks: 4x 4TB Seagate Constellation (7.2K RPM) plus 1x Intel 160GB DC S3500 SSD
Test pool has 400 placement groups (and placement groups for placement).
fio configuration - read:
[global]
rw=randread
filename=/dev/rbd1
ioengine=posixaio
iodepth=256
direct=1
runtime=60
ramp_time=30
blocksize=4k
write_bw_log=fio-2-random-read
write_lat_log=fio-2-random-read
write_iops_log=fio-2-random-read
fio configuration - writes:
[global]
rw=randwrite
filename=/dev/rbd1
ioengine=posixaio
iodepth=256
direct=1
runtime=60
ramp_time=30
blocksize=4k
write_bw_log=fio-2-random-write
write_lat_log=fio-2-random-write
write_iops_log=fio-2-random-write
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com