Re: Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Sat, 28 Feb 2015 12:43:38 +0100 (CET)

Hi,

First, test if your ssd can write fast with O_DSYNC
check this blog:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sharding.

In my last test with giant, I was able to reach around 120000iops with 6osd/intel s3500 ssd, but I was cpu limited.

----- Mail original -----
De: "mad Engineer" <themadengin33r@xxxxxxxxx>
À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Samedi 28 Février 2015 12:19:56
Objet:  Extreme slowness in SSD cluster with 3 nodes and 9 OSD	with 3.16-3 kernel

Hello All, 

I am trying ceph-firefly 0.80.8 
(69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 
850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS 
with 3.16-3 kernel.All are connected to 10G ports with maximum 
MTU.There are no extra disks for journaling and also there are no 
separate network for replication and data transfer.All 3 nodes are 
also hosting monitoring process.Operating system runs on SATA disk. 

When doing a sequential benchmark using "dd" on RBD, mounted on client 
as ext4 its taking 110s to write 100Mb data at an average speed of 
926Kbps. 

time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
25000+0 records in 
25000+0 records out 
102400000 bytes (102 MB) copied, 110.582 s, 926 kB/s 

real 1m50.585s 
user 0m0.106s 
sys 0m2.233s 

While doing this directly on ssd mount point shows: 

time dd if=/dev/zero of=hello bs=4k count=25000 
oflag=direct 
25000+0 records in 
25000+0 records out 
102400000 bytes (102 MB) copied, 1.38567 
s, 73.9 MB/s 

OSDs are in XFS with these extra arguments : 

rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 

ceph.conf 

[global] 
fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
mon_initial_members = ceph1, ceph2, ceph3 
mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 
auth_cluster_required = cephx 
auth_service_required = cephx 
auth_client_required = cephx 
filestore_xattr_use_omap = true 
osd_pool_default_size = 2 
osd_pool_default_min_size = 2 
osd_pool_default_pg_num = 450 
osd_pool_default_pgp_num = 450 
max_open_files = 131072 

[osd] 
osd_mkfs_type = xfs 
osd_op_threads = 8 
osd_disk_threads = 4 
osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" 

on our traditional storage with Full SAS disk, same "dd" completes in 
16s with an average write speed of 6Mbps. 

Rados bench: 

rados bench -p rbd 10 write 
Maintaining 16 concurrent writes of 4194304 bytes for up to 10 
seconds or 0 objects 
Object prefix: benchmark_data_ceph1_2977 
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 
0 0 0 0 0 0 - 0 
1 16 94 78 311.821 312 0.041228 0.140132 
2 16 192 176 351.866 392 0.106294 0.175055 
3 16 275 259 345.216 332 0.076795 0.166036 
4 16 302 286 285.912 108 0.043888 0.196419 
5 16 395 379 303.11 372 0.126033 0.207488 
6 16 501 485 323.242 424 0.125972 0.194559 
7 16 621 605 345.621 480 0.194155 0.183123 
8 16 730 714 356.903 436 0.086678 0.176099 
9 16 814 798 354.572 336 0.081567 0.174786 
10 16 832 816 326.313 72 0.037431 0.182355 
11 16 833 817 297.013 4 0.533326 0.182784 
Total time run: 11.489068 
Total writes made: 833 
Write size: 4194304 
Bandwidth (MB/sec): 290.015 

Stddev Bandwidth: 175.723 
Max bandwidth (MB/sec): 480 
Min bandwidth (MB/sec): 0 
Average Latency: 0.220582 
Stddev Latency: 0.343697 
Max latency: 2.85104 
Min latency: 0.035381 

Our ultimate aim is to replace existing SAN with ceph,but for that it 
should meet minimum 8000 iops.Can any one help me with this,OSD are 
SSD,CPU has good clock speed,backend network is good but still we are 
not able to extract full capability of SSD disks. 

Thanks, 
_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com