Hi All, I’m finding my write performance is less than I would have expected. After spending some considerable amount of time testing several different configurations I can never seems to break over ~360mb/s write even when using tmpfs for journaling. So I’ve purchased 3x Dell R515’s with 1 x AMD 6C CPU with 12 x 3TB SAS & 2 x 100GB Intel DC S3700 SSD’s & 32GB Ram with the Perc H710p Raid controller and Dual Port 10GBE Network Cards. So first up I realise the SSD’s were a mistake, I should have bought the 200GB Ones as they have considerably better write though put of ~375 Mb/s vs 200 Mb/s So to our Nodes Configuration, 2 x 3TB disks in Raid1 for OS/MON & 1 partition for OSD, 12 Disks in a Single each in a Raid0 (like a JBOD Fashion) with a 1MB Stripe size,
(Stripe size this part was particularly important because I found the stripe size matters considerably even on a single disk raid0. contrary to what you might read on the internet)
Also each disk is configured with (write back cache) is enabled and (read head) disabled. For Networking, All nodes are connected via LACP bond with L3 hashing and using iperf I can get up to 16gbit/s tx and rx between the nodes. OS: Ubuntu 12.04.3 LTS w/ Kernel 3.10.12-031012-generic (had to upgrade kernel due to 10Gbit Intel NIC’s driver issues) So this gives me 11 OSD’s & 2 SSD’s Per Node. Next I’ve tried several different configurations which I’ll briefly describe 2 of which below, 1)
Cluster Configuration 1, 33 OSD’s with 6x SSD’s as Journals, w/ 15GB Journals on SSD. # ceph osd pool create benchmark1 1800 1800 # rados bench -p benchmark1 180 write --no-cleanup -------------------------------------------------- Maintaining 16 concurrent writes of 4194304 bytes for up to 180 seconds or 0 objects Total time run: 180.250417 Total writes made: 10152 Write size: 4194304 Bandwidth (MB/sec): 225.287 Stddev Bandwidth: 35.0897 Max bandwidth (MB/sec): 312 Min bandwidth (MB/sec): 0 Average Latency: 0.284054 Stddev Latency: 0.199075 Max latency: 1.46791 Min latency: 0.038512 -------------------------------------------------- # rados bench -p benchmark1 180 seq ------------------------------------------------- Total time run: 43.782554 Total reads made: 10120 Read size: 4194304 Bandwidth (MB/sec): 924.569 Average Latency: 0.0691903 Max latency: 0.262542 Min latency: 0.015756 ------------------------------------------------- In this configuration I found my write performance suffers a lot to the SSD’s seem to be a bottleneck and my write performance using rados bench was around 224-230mb/s 2)
Cluster Configuration 2, 33 OSD’s with 1Gbyte Journals on tmpfs. # ceph osd pool create benchmark1 1800 1800 # rados bench -p benchmark1 180 write --no-cleanup -------------------------------------------------- Maintaining 16 concurrent writes of 4194304 bytes for up to 180 seconds or 0 objects Total time run: 180.044669 Total writes made: 15328 Write size: 4194304 Bandwidth (MB/sec): 340.538 Stddev Bandwidth: 26.6096 Max bandwidth (MB/sec): 380 Min bandwidth (MB/sec): 0 Average Latency: 0.187916 Stddev Latency: 0.0102989 Max latency: 0.336581 Min latency: 0.034475 -------------------------------------------------- # rados bench -p benchmark1 180 seq ------------------------------------------------- Total time run: 76.481303 Total reads made: 15328 Read size: 4194304 Bandwidth (MB/sec): 801.660 Average Latency: 0.079814 Max latency: 0.317827 Min latency: 0.016857 ------------------------------------------------- Now it seems there is no bottleneck for journaling as we are using tmpfs, however still less then what I would expect write speed the sas disks are barely busy via iostat.. So I thought it might be a disk bus throughput issue. Next I completed some dd tests… This below is in a script dd-x.sh which executes the 11 readers or writers at once. dd if=/dev/zero of=/srv/ceph/osd.0/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.1/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.2/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.3/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.4/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.5/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.6/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.7/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.8/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.9/ddfile bs=32k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.10/ddfile bs=32k count=100k oflag=direct & this gives me aggregated write throughput of around 1,135 MB/s Write. Simular script now to test reads, dd if=/srv/ceph/osd.0/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.1/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.2/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.3/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.4/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.5/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.6/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.7/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.8/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.9/ddfile of=/dev/null bs=32k count=100k iflag=direct & dd if=/srv/ceph/osd.10/ddfile of=/dev/null bs=32k count=100k iflag=direct & this gives me aggregated read throughput of around 1,382 MB/s Read. Next I’ll lower the block size to show the results, dd if=/dev/zero of=/srv/ceph/osd.0/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.1/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.2/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.3/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.4/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.5/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.6/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.7/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.8/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.9/ddfile bs=4k count=100k oflag=direct & dd if=/dev/zero of=/srv/ceph/osd.10/ddfile bs=4k count=100k oflag=direct & this gives me aggregated write throughput of around 300 MB/s Write. dd if=/srv/ceph/osd.0/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.1/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.2/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.3/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.4/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.5/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.6/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.7/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.8/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.9/ddfile of=/dev/null bs=4k count=100k iflag=direct & dd if=/srv/ceph/osd.10/ddfile of=/dev/null bs=4k count=100k iflag=direct & this gives me aggregated read throughput of around 430 MB/s Read, This is my ceph.conf, only difference between the configs is the journal dio = false ---------------- [global] auth cluster required = cephx auth service required = cephx auth client required = cephx public network = 10.100.96.0/24 cluster network = 10.100.128.0/24 journal dio = false [mon] mon data = ""> [mon.a] host = rbd01 mon addr = 10.100.96.10:6789 [mon.b] host = rbd02 mon addr = 10.100.96.11:6789 [mon.c] host = rbd03 mon addr = 10.100.96.12:6789 [osd] osd data = ""> osd journal size = 1000 osd mkfs type = xfs osd mkfs options xfs = "-f" osd mount options xfs = "rw,noexec,nodev,noatime,nodiratime,barrier=0,inode64,logbufs=8,logbsize=256k" [osd.0] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sda5 [osd.1] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdb2 [osd.2] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdc2 [osd.3] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdd2 [osd.4] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sde2 [osd.5] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdf2 [osd.6] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdg2 [osd.7] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdh2 [osd.8] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdi2 [osd.9] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdj2 [osd.10] host = rbd01 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdk2 [osd.11] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sda5 [osd.12] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdb2 [osd.13] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdc2 [osd.14] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdd2 [osd.15] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sde2 [osd.16] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdf2 [osd.17] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdg2 [osd.18] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdh2 [osd.19] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdi2 [osd.20] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdj2 [osd.21] host = rbd02 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdk2 [osd.22] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sda5 [osd.23] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdb2 [osd.24] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdc2 [osd.25] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdd2 [osd.26] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sde2 [osd.27] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdf2 [osd.28] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdg2 [osd.29] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdh2 [osd.30] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdi2 [osd.31] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdj2 [osd.32] host = rbd03 osd journal = /tmp/tmpfs/osd.$id devs = /dev/sdk2 --------------------- Any Ideas? Cheers, Quenten |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com