Re: testing ceph - very slow write performances

Chris Hoy Poy <chris@xxxxxxxxxxxxxxxx> · Fri, 26 Jul 2013 09:29:13 +0800 (WST)

yes, those drives are horrible, and you have them partitioned etc.

- don't use MDADM for Ceph OSDs, in my experience it *does* impair performance, it just doesn't play nice with OSDs.
-- Ceph does its own block replication - though be careful, a size of "2" is not necessarily as "safe" as raid10 (lose any 2 drives vs. lose 2 specific drives)
- For each write, it's going to write to Ceph's journal, then that OSD is going to ensure that each write is synced to other journals (depending on how many copies you have etc) - BEFORE it returns (latency!)

If it is just a test run : try dedicating a drive to the OSD, and a drive to the OS. To see the impact of not having SSD journals, or latency on second writes - try setting replication size to 1 (not great/ideal - but gives you an idea of how much that extra sync write for the replicated writes is having on performance etc). 

Ceph really really shines when it has solid state for its write journalling. 

The black caviar drives are not fantastic for latency either, that can have a significant impact (particularly for the journal!). 

\\chris

----- Original Message -----
From: "Sébastien RICCIO" <sr@xxxxxxxxxxxxxxx>
To: ceph-users@xxxxxxxxxxxxxx
Sent: Thursday, 25 July, 2013 11:27:48 PM
Subject:  testing ceph - very slow write performances

Hi ceph-users,

I'm actually evaluating ceph for a project and I'm getting quite low 
write performances, so please if you have time reading this post and  
give me some advices :)

My test setup using some free hardware we have laying in our datacenter:

Three ceph server nodes, on each one is running a monitor and two OSDs 
and one client node

Hardware of a node: (supermicro stuff)

Intel(R) Xeon(R) CPU X3440  @ 2.53GHz (total of 8 logical cores)
2 x Western Digital Caviar Black 1TO (WD1003FBYX-01Y7B0)
32 GB RAM DDR3
2 x Ehernet controller: Intel Corporation 82574L Gigabit Network Connection

Hardware of the client: (A dell Blade M610)
Dual Intel(R) Xeon(R) CPU E5620  @ 2.40GHz (total of 16 logical cores)
64 GB RAM DDR3
4 x Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S 
Gigabit Ethernet (rev 20)
2 x Ethernet controller: Broadcom Corporation NetXtreme II BCM57711 
10-Gigabit PCIe

OS of the server nodes:

Ubuntu 12.04.2 LTS
Kernel 3.10.0-031000-generic #201306301935 SMP Sun Jun 30 23:36:16 UTC 
2013 x86_64 x86_64 x86_64 GNU/Linux

OS of the client node:
CentOS release 6.4
Kernel 3.10.1-1.el6xen.x86_64 #1 SMP Sun Jul 14 11:05:42 EST 2013 x86_64 
x86_64 x86_64 GNU/Linux

How I did setup the OS (server nodes):

I know this isn't good but as there is only two disk in the machine I've 
partitionned the disks and used them both for the OS and the OSDs, but 
well for a test run it shouldn't be that bad...

Disks layout:

partition 1: mdadm raid 1 member for the OS (30gb)
partition 2: mdadm raid 1 member for some swapspace (shouldn't be used 
anyway...)
partition 3: reserved for xfs partition for OSDs

Ceph installation:
Tried both cuttlefish (0.56) and testing (0.66).
Deployed using ceph-deploy from an admin node running on a xenserver 6.2 VM.

#ceph-deploy new ceph01 ceph02 ceph03
(edited some ceph.conf stuff)
#ceph-deploy install --stable cuttlefish ceph01 ceph02 ceph03
#ceph-deploy mon create ceph01 ceph02 ceph03
#ceph-deploy gatherkeys ceph01
#ceph-deploy osd create ceph01:/dev/sda3 ceph01:/dev/sdb3 
ceph02:/dev/sda3 ceph02:/dev/sdb3 ceph03:/dev/sda3 ceph03:/dev/sdb3
#ceph-deploy osd activate ceph01:/dev/sda3 ceph01:/dev/sdb3 
ceph02:/dev/sda3 ceph02:/dev/sdb3 ceph03:/dev/sda3 ceph03:/dev/sdb3

ceph-admin:~/cephstore$ ceph status
    health HEALTH_OK
    monmap e1: 3 mons at 
{ceph01=10.111.80.1:6789/0,ceph02=10.111.80.2:6789/0,ceph03=10.111.80.3:6789/0}, 
election epoch 6, quorum 0,1,2 ceph01,ceph02,ceph03
    osdmap e26: 6 osds: 6 up, 6 in
     pgmap v258: 192 pgs: 192 active+clean; 1000 MB data, 62212 MB used, 
5346 GB / 5407 GB avail
    mdsmap e1: 0/0/1 up

Now let's do some performance testing from the client, accessing a rbd 
on the cluster.

#rbd create test --size 20000
#rbd map test

raw write test (ouch something is wrong here)
#dd if=/dev/zero of=/dev/rbd1 bs=1024k count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 146.051 s, 7.2 MB/s

raw read test (this seems quite ok for a gbit network)
#dd if=/dev/rbd1 of=/dev/null bs=1024k count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 13.6368 s, 76.9 MB/s

Trying to find the bottleneck

networking testing between client and nodes (not 100% efficiency but not 
that bad)

[  3] local 10.111.80.1 port 37497 connected with 10.111.10.105 port 5001
[  3]  0.0-10.0 sec   812 MBytes   681 Mbits/sec

[  3] local 10.111.80.2 port 55912 connected with 10.111.10.105 port 5001
[  3]  0.0-10.0 sec   802 MBytes   673 Mbits/sec

[  3] local 10.111.80.3 port 45188 connected with 10.111.10.105 port 5001
[  3]  0.0-10.1 sec   707 MBytes   589 Mbits/sec

[  3] local 10.111.10.105 port 43103 connected with 10.111.80.1 port 5001
[  3]  0.0-10.2 sec   730 MBytes   601 Mbits/sec

[  3] local 10.111.10.105 port 44656 connected with 10.111.80.2 port 5001
[  3]  0.0-10.0 sec   871 MBytes   730 Mbits/sec

[  3] local 10.111.10.105 port 40455 connected with 10.111.80.3 port 5001
[  3]  0.0-10.0 sec  1005 MBytes   843 Mbits/sec

Disk throughput on the ceph nodes

/var/lib/ceph/osd/ceph-0$ sudo dd if=/dev/zero of=test bs=1024k 
count=1000 oflag=direct
1048576000 bytes (1.0 GB) copied, 7.96581 s, 132 MB/s

/var/lib/ceph/osd/ceph-1$ sudo dd if=/dev/zero of=test bs=1024k 
count=1000 oflag=direct
1048576000 bytes (1.0 GB) copied, 7.91835 s, 132 MB/s

/var/lib/ceph/osd/ceph-2$ sudo dd if=/dev/zero of=test bs=1024k 
count=1000 oflag=direct
1048576000 bytes (1.0 GB) copied, 7.55287 s, 139 MB/s

/var/lib/ceph/osd/ceph-3$ sudo dd if=/dev/zero of=test bs=1024k 
count=1000 oflag=direct
1048576000 bytes (1.0 GB) copied, 7.67281 s, 137 MB/s

/var/lib/ceph/osd/ceph-4$ sudo dd if=/dev/zero of=test bs=1024k 
count=1000 oflag=direct
1048576000 bytes (1.0 GB) copied, 8.13862 s, 129 MB/s

/var/lib/ceph/osd/ceph-5$ sudo dd if=/dev/zero of=test bs=1024k 
count=1000 oflag=direct
1048576000 bytes (1.0 GB) copied, 7.72034 s, 136 MB/s

Actually I don't know what else to check.

So let me ask if that if 7.2MB/s write performance for 1024k blocks is 
the expected performance for such a test setup ?

Any advice appreciated :)

Sorry for the long post, I wanted to give you enough infos for you to 
have an overview of the thing.

Cheers,
Sébastien

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com