Re: Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

mad Engineer <themadengin33r@xxxxxxxxx> · Sat, 28 Feb 2015 19:14:17 +0530

thanks for that link Alexandre,
as per that link tried these:
 850 EVO
without dsync

 dd if=randfile of=/dev/sdb1 bs=4k count=100000 oflag=direct
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 4.42913 s, 92.5 MB/s

with dsync:

 dd if=randfile of=/dev/sdb1 bs=4k count=100000 oflag=direct,dsync
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 83.4916 s, 4.9 MB/s

on 840 EVOdd if=randfile of=/dev/sdd1 bs=4k count=100000 oflag=direct
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 5.11912 s, 80.0 MB/s

with dsync
 dd if=randfile of=/dev/sdd1 bs=4k count=100000 oflag=direct,dsync
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 196.738 s, 2.1 MB/s

So with dsync there is significant reduction in performance,looks like 850 is better than 840.Can this be the reason for reduced write speed of 926kbps?

Also before trying on physical servers i ran ceph on vmware vms with SAS disks using giant 0.87 ,at that time fire-fly 80.8 was giving higher numbers,so decided to use firefly.

On Sat, Feb 28, 2015 at 5:13 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
Hi,

First, test if your ssd can write fast with O_DSYNC

check this blog:

http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sharding.

In my last test with giant, I was able to reach around 120000iops with 6osd/intel s3500 ssd, but I was cpu limited.

----- Mail original -----

De: "mad Engineer" <themadengin33r@xxxxxxxxx>

À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

Envoyé: Samedi 28 Février 2015 12:19:56

Objet:  Extreme slowness in SSD cluster with 3 nodes and 9 OSD      with 3.16-3 kernel

Hello All,

I am trying ceph-firefly 0.80.8

(69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD

850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS

with 3.16-3 kernel.All are connected to 10G ports with maximum

MTU.There are no extra disks for journaling and also there are no

separate network for replication and data transfer.All 3 nodes are

also hosting monitoring process.Operating system runs on SATA disk.

When doing a sequential benchmark using "dd" on RBD, mounted on client

as ext4 its taking 110s to write 100Mb data at an average speed of

926Kbps.

time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct

25000+0 records in

25000+0 records out

102400000 bytes (102 MB) copied, 110.582 s, 926 kB/s

real 1m50.585s

user 0m0.106s

sys 0m2.233s

While doing this directly on ssd mount point shows:

time dd if=/dev/zero of=hello bs=4k count=25000

oflag=direct

25000+0 records in

25000+0 records out

102400000 bytes (102 MB) copied, 1.38567

s, 73.9 MB/s

OSDs are in XFS with these extra arguments :

rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

ceph.conf

[global]

fsid = 7d889081-7826-439c-9fe5-d4e57480d9be

mon_initial_members = ceph1, ceph2, ceph3

mon_host = 10.99.10.118,10.99.10.119,10.99.10.120

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

osd_pool_default_size = 2

osd_pool_default_min_size = 2

osd_pool_default_pg_num = 450

osd_pool_default_pgp_num = 450

max_open_files = 131072

[osd]

osd_mkfs_type = xfs

osd_op_threads = 8

osd_disk_threads = 4

osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"

on our traditional storage with Full SAS disk, same "dd" completes in

16s with an average write speed of 6Mbps.

Rados bench:

rados bench -p rbd 10 write

Maintaining 16 concurrent writes of 4194304 bytes for up to 10

seconds or 0 objects

Object prefix: benchmark_data_ceph1_2977

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat

0 0 0 0 0 0 - 0

1 16 94 78 311.821 312 0.041228 0.140132

2 16 192 176 351.866 392 0.106294 0.175055

3 16 275 259 345.216 332 0.076795 0.166036

4 16 302 286 285.912 108 0.043888 0.196419

5 16 395 379 303.11 372 0.126033 0.207488

6 16 501 485 323.242 424 0.125972 0.194559

7 16 621 605 345.621 480 0.194155 0.183123

8 16 730 714 356.903 436 0.086678 0.176099

9 16 814 798 354.572 336 0.081567 0.174786

10 16 832 816 326.313 72 0.037431 0.182355

11 16 833 817 297.013 4 0.533326 0.182784

Total time run: 11.489068

Total writes made: 833

Write size: 4194304

Bandwidth (MB/sec): 290.015

Stddev Bandwidth: 175.723

Max bandwidth (MB/sec): 480

Min bandwidth (MB/sec): 0

Average Latency: 0.220582

Stddev Latency: 0.343697

Max latency: 2.85104

Min latency: 0.035381

Our ultimate aim is to replace existing SAN with ceph,but for that it

should meet minimum 8000 iops.Can any one help me with this,OSD are

SSD,CPU has good clock speed,backend network is good but still we are

not able to extract full capability of SSD disks.

Thanks,

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com