yes, those drives are horrible, and you have them partitioned etc. - don't use MDADM for Ceph OSDs, in my experience it *does* impair performance, it just doesn't play nice with OSDs. -- Ceph does its own block replication - though be careful, a size of "2" is not necessarily as "safe" as raid10 (lose any 2 drives vs. lose 2 specific drives) - For each write, it's going to write to Ceph's journal, then that OSD is going to ensure that each write is synced to other journals (depending on how many copies you have etc) - BEFORE it returns (latency!) If it is just a test run : try dedicating a drive to the OSD, and a drive to the OS. To see the impact of not having SSD journals, or latency on second writes - try setting replication size to 1 (not great/ideal - but gives you an idea of how much that extra sync write for the replicated writes is having on performance etc). Ceph really really shines when it has solid state for its write journalling. The black caviar drives are not fantastic for latency either, that can have a significant impact (particularly for the journal!). \\chris ----- Original Message ----- From: "Sébastien RICCIO" <sr@xxxxxxxxxxxxxxx> To: ceph-users@xxxxxxxxxxxxxx Sent: Thursday, 25 July, 2013 11:27:48 PM Subject: testing ceph - very slow write performances Hi ceph-users, I'm actually evaluating ceph for a project and I'm getting quite low write performances, so please if you have time reading this post and give me some advices :) My test setup using some free hardware we have laying in our datacenter: Three ceph server nodes, on each one is running a monitor and two OSDs and one client node Hardware of a node: (supermicro stuff) Intel(R) Xeon(R) CPU X3440 @ 2.53GHz (total of 8 logical cores) 2 x Western Digital Caviar Black 1TO (WD1003FBYX-01Y7B0) 32 GB RAM DDR3 2 x Ehernet controller: Intel Corporation 82574L Gigabit Network Connection Hardware of the client: (A dell Blade M610) Dual Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (total of 16 logical cores) 64 GB RAM DDR3 4 x Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20) 2 x Ethernet controller: Broadcom Corporation NetXtreme II BCM57711 10-Gigabit PCIe OS of the server nodes: Ubuntu 12.04.2 LTS Kernel 3.10.0-031000-generic #201306301935 SMP Sun Jun 30 23:36:16 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux OS of the client node: CentOS release 6.4 Kernel 3.10.1-1.el6xen.x86_64 #1 SMP Sun Jul 14 11:05:42 EST 2013 x86_64 x86_64 x86_64 GNU/Linux How I did setup the OS (server nodes): I know this isn't good but as there is only two disk in the machine I've partitionned the disks and used them both for the OS and the OSDs, but well for a test run it shouldn't be that bad... Disks layout: partition 1: mdadm raid 1 member for the OS (30gb) partition 2: mdadm raid 1 member for some swapspace (shouldn't be used anyway...) partition 3: reserved for xfs partition for OSDs Ceph installation: Tried both cuttlefish (0.56) and testing (0.66). Deployed using ceph-deploy from an admin node running on a xenserver 6.2 VM. #ceph-deploy new ceph01 ceph02 ceph03 (edited some ceph.conf stuff) #ceph-deploy install --stable cuttlefish ceph01 ceph02 ceph03 #ceph-deploy mon create ceph01 ceph02 ceph03 #ceph-deploy gatherkeys ceph01 #ceph-deploy osd create ceph01:/dev/sda3 ceph01:/dev/sdb3 ceph02:/dev/sda3 ceph02:/dev/sdb3 ceph03:/dev/sda3 ceph03:/dev/sdb3 #ceph-deploy osd activate ceph01:/dev/sda3 ceph01:/dev/sdb3 ceph02:/dev/sda3 ceph02:/dev/sdb3 ceph03:/dev/sda3 ceph03:/dev/sdb3 ceph-admin:~/cephstore$ ceph status health HEALTH_OK monmap e1: 3 mons at {ceph01=10.111.80.1:6789/0,ceph02=10.111.80.2:6789/0,ceph03=10.111.80.3:6789/0}, election epoch 6, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e26: 6 osds: 6 up, 6 in pgmap v258: 192 pgs: 192 active+clean; 1000 MB data, 62212 MB used, 5346 GB / 5407 GB avail mdsmap e1: 0/0/1 up Now let's do some performance testing from the client, accessing a rbd on the cluster. #rbd create test --size 20000 #rbd map test raw write test (ouch something is wrong here) #dd if=/dev/zero of=/dev/rbd1 bs=1024k count=1000 oflag=direct 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 146.051 s, 7.2 MB/s raw read test (this seems quite ok for a gbit network) #dd if=/dev/rbd1 of=/dev/null bs=1024k count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 13.6368 s, 76.9 MB/s Trying to find the bottleneck networking testing between client and nodes (not 100% efficiency but not that bad) [ 3] local 10.111.80.1 port 37497 connected with 10.111.10.105 port 5001 [ 3] 0.0-10.0 sec 812 MBytes 681 Mbits/sec [ 3] local 10.111.80.2 port 55912 connected with 10.111.10.105 port 5001 [ 3] 0.0-10.0 sec 802 MBytes 673 Mbits/sec [ 3] local 10.111.80.3 port 45188 connected with 10.111.10.105 port 5001 [ 3] 0.0-10.1 sec 707 MBytes 589 Mbits/sec [ 3] local 10.111.10.105 port 43103 connected with 10.111.80.1 port 5001 [ 3] 0.0-10.2 sec 730 MBytes 601 Mbits/sec [ 3] local 10.111.10.105 port 44656 connected with 10.111.80.2 port 5001 [ 3] 0.0-10.0 sec 871 MBytes 730 Mbits/sec [ 3] local 10.111.10.105 port 40455 connected with 10.111.80.3 port 5001 [ 3] 0.0-10.0 sec 1005 MBytes 843 Mbits/sec Disk throughput on the ceph nodes /var/lib/ceph/osd/ceph-0$ sudo dd if=/dev/zero of=test bs=1024k count=1000 oflag=direct 1048576000 bytes (1.0 GB) copied, 7.96581 s, 132 MB/s /var/lib/ceph/osd/ceph-1$ sudo dd if=/dev/zero of=test bs=1024k count=1000 oflag=direct 1048576000 bytes (1.0 GB) copied, 7.91835 s, 132 MB/s /var/lib/ceph/osd/ceph-2$ sudo dd if=/dev/zero of=test bs=1024k count=1000 oflag=direct 1048576000 bytes (1.0 GB) copied, 7.55287 s, 139 MB/s /var/lib/ceph/osd/ceph-3$ sudo dd if=/dev/zero of=test bs=1024k count=1000 oflag=direct 1048576000 bytes (1.0 GB) copied, 7.67281 s, 137 MB/s /var/lib/ceph/osd/ceph-4$ sudo dd if=/dev/zero of=test bs=1024k count=1000 oflag=direct 1048576000 bytes (1.0 GB) copied, 8.13862 s, 129 MB/s /var/lib/ceph/osd/ceph-5$ sudo dd if=/dev/zero of=test bs=1024k count=1000 oflag=direct 1048576000 bytes (1.0 GB) copied, 7.72034 s, 136 MB/s Actually I don't know what else to check. So let me ask if that if 7.2MB/s write performance for 1024k blocks is the expected performance for such a test setup ? Any advice appreciated :) Sorry for the long post, I wanted to give you enough infos for you to have an overview of the thing. Cheers, Sébastien _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com