Re: Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Sat, 28 Feb 2015 14:19:25 +0100 (CET)

As optimisation,

try to set ioscheduler to noop, 

and also enable rbd_cache=true. (It's really helping for for sequential writes)

but your results seem quite low, 926kb/s with 4k, it's only 200io/s.

check if you don't have any big network latencies, or mtu fragementation problem.

Maybe also try to bench with fio, with more parallel jobs.

----- Mail original -----
De: "mad Engineer" <themadengin33r@xxxxxxxxx>
À: "Philippe Schwarz" <phil@xxxxxxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Samedi 28 Février 2015 13:06:59
Objet: Re:  Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

Thanks for the reply Philippe,we were using these disks in our NAS,now 
it looks like i am in big trouble :-( 

On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz <phil@xxxxxxxxxxxxxx> wrote: 
> -----BEGIN PGP SIGNED MESSAGE----- 
> Hash: SHA1 
> 
> Le 28/02/2015 12:19, mad Engineer a écrit : 
>> Hello All, 
>> 
>> I am trying ceph-firefly 0.80.8 
>> (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung 
>> SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 
>> 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with 
>> maximum MTU.There are no extra disks for journaling and also there 
>> are no separate network for replication and data transfer.All 3 
>> nodes are also hosting monitoring process.Operating system runs on 
>> SATA disk. 
>> 
>> When doing a sequential benchmark using "dd" on RBD, mounted on 
>> client as ext4 its taking 110s to write 100Mb data at an average 
>> speed of 926Kbps. 
>> 
>> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
>> 25000+0 records in 25000+0 records out 102400000 bytes (102 MB) 
>> copied, 110.582 s, 926 kB/s 
>> 
>> real 1m50.585s user 0m0.106s sys 0m2.233s 
>> 
>> While doing this directly on ssd mount point shows: 
>> 
>> time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct 
>> 25000+0 records in 25000+0 records out 102400000 bytes (102 MB) 
>> copied, 1.38567 s, 73.9 MB/s 
>> 
>> OSDs are in XFS with these extra arguments : 
>> 
>> rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M 
>> 
>> ceph.conf 
>> 
>> [global] fsid = 7d889081-7826-439c-9fe5-d4e57480d9be 
>> mon_initial_members = ceph1, ceph2, ceph3 mon_host = 
>> 10.99.10.118,10.99.10.119,10.99.10.120 auth_cluster_required = 
>> cephx auth_service_required = cephx auth_client_required = cephx 
>> filestore_xattr_use_omap = true osd_pool_default_size = 2 
>> osd_pool_default_min_size = 2 osd_pool_default_pg_num = 450 
>> osd_pool_default_pgp_num = 450 max_open_files = 131072 
>> 
>> [osd] osd_mkfs_type = xfs osd_op_threads = 8 osd_disk_threads = 4 
>> osd_mount_options_xfs = 
>> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" 
>> 
>> 
>> on our traditional storage with Full SAS disk, same "dd" completes 
>> in 16s with an average write speed of 6Mbps. 
>> 
>> Rados bench: 
>> 
>> rados bench -p rbd 10 write Maintaining 16 concurrent writes of 
>> 4194304 bytes for up to 10 seconds or 0 objects Object prefix: 
>> benchmark_data_ceph1_2977 sec Cur ops started finished avg MB/s 
>> cur MB/s last lat avg lat 0 0 0 0 
>> 0 0 - 0 1 16 94 78 
>> 311.821 312 0.041228 0.140132 2 16 192 176 
>> 351.866 392 0.106294 0.175055 3 16 275 259 
>> 345.216 332 0.076795 0.166036 4 16 302 286 
>> 285.912 108 0.043888 0.196419 5 16 395 379 
>> 303.11 372 0.126033 0.207488 6 16 501 485 
>> 323.242 424 0.125972 0.194559 7 16 621 605 
>> 345.621 480 0.194155 0.183123 8 16 730 714 
>> 356.903 436 0.086678 0.176099 9 16 814 798 
>> 354.572 336 0.081567 0.174786 10 16 832 
>> 816 326.313 72 0.037431 0.182355 11 16 833 
>> 817 297.013 4 0.533326 0.182784 Total time run: 
>> 11.489068 Total writes made: 833 Write size: 
>> 4194304 Bandwidth (MB/sec): 290.015 
>> 
>> Stddev Bandwidth: 175.723 Max bandwidth (MB/sec): 480 Min 
>> bandwidth (MB/sec): 0 Average Latency: 0.220582 Stddev 
>> Latency: 0.343697 Max latency: 2.85104 Min 
>> latency: 0.035381 
>> 
>> Our ultimate aim is to replace existing SAN with ceph,but for that 
>> it should meet minimum 8000 iops.Can any one help me with this,OSD 
>> are SSD,CPU has good clock speed,backend network is good but still 
>> we are not able to extract full capability of SSD disks. 
>> 
>> 
>> 
>> Thanks, 
> 
> Hi, i'm new to ceph so, don't consider my words as holy truth. 
> 
> It seems that Samsung 840 (so i assume 850) are crappy for ceph : 
> 
> MTBF : 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html 
> Bandwidth 
> :http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045247.html 
> 
> And according to a confirmed user of Ceph/ProxmoX, Samsung SSDs should 
> be avoided if possible in ceph storage. 
> 
> Apart from that, it seems there was an limitation in ceph for the use 
> of the complete bandwidth available in SSDs; but i think with less 
> than 1Mb/s you haven't hit this limit. 
> 
> I remind you that i'm not a ceph-guru (far from that, indeed), so feel 
> free to disagree; i'm on the way to improve my knowledge. 
> 
> Best regards. 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE----- 
> Version: GnuPG v1 
> 
> iEYEARECAAYFAlTxp0UACgkQlhqCFkbqHRb5+wCgrXCM3VsnVE6PCbbpOmQXCXbr 
> 8u0An2BUgZWismSK0PxbwVDOD5+/UWik 
> =0o0v 
> -----END PGP SIGNATURE----- 
_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com