Re: Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Sat, 28 Feb 2015 14:14:12 +0100 (CET)

>>But this was replication1? I never was able to do more than 30 000 with replication 3.

Oh, sorry, it's was about read.

for write, I think I was around 30000iops with 3 nodes (2x4cores 2,1ghz each), cpu bound, with replication x1.
with replication x3, around 9000iops.

Going to test on 2x10cores 3,1ghz in some weeks.

----- Mail original -----
De: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx>
À: "aderumier" <aderumier@xxxxxxxxx>
Cc: "mad Engineer" <themadengin33r@xxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Samedi 28 Février 2015 13:42:54
Objet: Re:  Extreme slowness in SSD cluster with 3 nodes and 9 OSD	with 3.16-3 kernel

> Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER <aderumier@xxxxxxxxx>: 
> 
> Hi, 
> 
> First, test if your ssd can write fast with O_DSYNC 
> check this blog: 
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ 
> 
> 
> Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sharding. 
> 
> In my last test with giant, I was able to reach around 120000iops with 6osd/intel s3500 ssd, but I was cpu limited. 

But this was replication1? I never was able to do more than 30 000 with replication 3. 

Stefan 

> 
> ----- Mail original ----- 
> De: "mad Engineer" <themadengin33r@xxxxxxxxx> 
> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> 
> Envoyé: Samedi 28 Février 2015 12:19:56 
> Objet:  Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel 
> 
> Hello All, 
> 
> I am trying ceph-firefly 0.80.8 
> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 
> 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS 
> with 3.16-3 kernel.All are connected to 10G ports with maximum 
> MTU.There are no extra disks for journaling and also there are no 
> separate network for replication and data transfer.All 3 nodes are 
> also hosting monitoring process.Operating system runs on SATA disk. 
> 
> When doing a sequential benchmark using "dd" on RBD, mounted on client 
> as ext4 its taking 110s to write 100Mb data at an average speed of 
> 926Kbps. 
> 
> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
> 25000+0 records in 
> 25000+0 records out 
> 102400000 bytes (102 MB) copied, 110.582 s, 926 kB/s 
> 
> real 1m50.585s 
> user 0m0.106s 
> sys 0m2.233s 
> 
> While doing this directly on ssd mount point shows: 
> 
> time dd if=/dev/zero of=hello bs=4k count=25000 
> oflag=direct 
> 25000+0 records in 
> 25000+0 records out 
> 102400000 bytes (102 MB) copied, 1.38567 
> s, 73.9 MB/s 
> 
> OSDs are in XFS with these extra arguments : 
> 
> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
> 
> ceph.conf 
> 
> [global] 
> fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
> mon_initial_members = ceph1, ceph2, ceph3 
> mon_host = 10.99.10.118,10.99.10.119,10.99.10.120 
> auth_cluster_required = cephx 
> auth_service_required = cephx 
> auth_client_required = cephx 
> filestore_xattr_use_omap = true 
> osd_pool_default_size = 2 
> osd_pool_default_min_size = 2 
> osd_pool_default_pg_num = 450 
> osd_pool_default_pgp_num = 450 
> max_open_files = 131072 
> 
> [osd] 
> osd_mkfs_type = xfs 
> osd_op_threads = 8 
> osd_disk_threads = 4 
> osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" 
> 
> 
> on our traditional storage with Full SAS disk, same "dd" completes in 
> 16s with an average write speed of 6Mbps. 
> 
> Rados bench: 
> 
> rados bench -p rbd 10 write 
> Maintaining 16 concurrent writes of 4194304 bytes for up to 10 
> seconds or 0 objects 
> Object prefix: benchmark_data_ceph1_2977 
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 
> 0 0 0 0 0 0 - 0 
> 1 16 94 78 311.821 312 0.041228 0.140132 
> 2 16 192 176 351.866 392 0.106294 0.175055 
> 3 16 275 259 345.216 332 0.076795 0.166036 
> 4 16 302 286 285.912 108 0.043888 0.196419 
> 5 16 395 379 303.11 372 0.126033 0.207488 
> 6 16 501 485 323.242 424 0.125972 0.194559 
> 7 16 621 605 345.621 480 0.194155 0.183123 
> 8 16 730 714 356.903 436 0.086678 0.176099 
> 9 16 814 798 354.572 336 0.081567 0.174786 
> 10 16 832 816 326.313 72 0.037431 0.182355 
> 11 16 833 817 297.013 4 0.533326 0.182784 
> Total time run: 11.489068 
> Total writes made: 833 
> Write size: 4194304 
> Bandwidth (MB/sec): 290.015 
> 
> Stddev Bandwidth: 175.723 
> Max bandwidth (MB/sec): 480 
> Min bandwidth (MB/sec): 0 
> Average Latency: 0.220582 
> Stddev Latency: 0.343697 
> Max latency: 2.85104 
> Min latency: 0.035381 
> 
> Our ultimate aim is to replace existing SAN with ceph,but for that it 
> should meet minimum 8000 iops.Can any one help me with this,OSD are 
> SSD,CPU has good clock speed,backend network is good but still we are 
> not able to extract full capability of SSD disks. 
> 
> 
> 
> Thanks, 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@xxxxxxxxxxxxxx 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@xxxxxxxxxxxxxx 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com