Re: Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



thanks for that link Alexandre,
as per that link tried these:
 850 EVO
without dsync

 dd if=randfile of=/dev/sdb1 bs=4k count=100000 oflag=direct
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 4.42913 s, 92.5 MB/s

with dsync:

 dd if=randfile of=/dev/sdb1 bs=4k count=100000 oflag=direct,dsync
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 83.4916 s, 4.9 MB/s

on 840 EVO
dd if=randfile of=/dev/sdd1 bs=4k count=100000 oflag=direct
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 5.11912 s, 80.0 MB/s

with dsync
 dd if=randfile of=/dev/sdd1 bs=4k count=100000 oflag=direct,dsync
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 196.738 s, 2.1 MB/s

So with dsync there is significant reduction in performance,looks like 850 is better than 840.Can this be the reason for reduced write speed of 926kbps?

Also before trying on physical servers i ran ceph on vmware vms with SAS disks using giant 0.87 ,at that time fire-fly 80.8 was giving higher numbers,so decided to use firefly.

On Sat, Feb 28, 2015 at 5:13 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
Hi,

First, test if your ssd can write fast with O_DSYNC
check this blog:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/


Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sharding.

In my last test with giant, I was able to reach around 120000iops with 6osd/intel s3500 ssd, but I was cpu limited.

----- Mail original -----
De: "mad Engineer" <themadengin33r@xxxxxxxxx>
À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Samedi 28 Février 2015 12:19:56
Objet: Extreme slowness in SSD cluster with 3 nodes and 9 OSD      with 3.16-3 kernel

Hello All,

I am trying ceph-firefly 0.80.8
(69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD
850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS
with 3.16-3 kernel.All are connected to 10G ports with maximum
MTU.There are no extra disks for journaling and also there are no
separate network for replication and data transfer.All 3 nodes are
also hosting monitoring process.Operating system runs on SATA disk.

When doing a sequential benchmark using "dd" on RBD, mounted on client
as ext4 its taking 110s to write 100Mb data at an average speed of
926Kbps.

time dd if=/dev/zero of=hello bs=4k count=25000 oflag=direct
25000+0 records in
25000+0 records out
102400000 bytes (102 MB) copied, 110.582 s, 926 kB/s

real 1m50.585s
user 0m0.106s
sys 0m2.233s

While doing this directly on ssd mount point shows:

time dd if=/dev/zero of=hello bs=4k count=25000
oflag=direct
25000+0 records in
25000+0 records out
102400000 bytes (102 MB) copied, 1.38567
s, 73.9 MB/s

OSDs are in XFS with these extra arguments :

rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

ceph.conf

[global]
fsid = 7d889081-7826-439c-9fe5-d4e57480d9be
mon_initial_members = ceph1, ceph2, ceph3
mon_host = 10.99.10.118,10.99.10.119,10.99.10.120
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 450
osd_pool_default_pgp_num = 450
max_open_files = 131072

[osd]
osd_mkfs_type = xfs
osd_op_threads = 8
osd_disk_threads = 4
osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"


on our traditional storage with Full SAS disk, same "dd" completes in
16s with an average write speed of 6Mbps.

Rados bench:

rados bench -p rbd 10 write
Maintaining 16 concurrent writes of 4194304 bytes for up to 10
seconds or 0 objects
Object prefix: benchmark_data_ceph1_2977
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 94 78 311.821 312 0.041228 0.140132
2 16 192 176 351.866 392 0.106294 0.175055
3 16 275 259 345.216 332 0.076795 0.166036
4 16 302 286 285.912 108 0.043888 0.196419
5 16 395 379 303.11 372 0.126033 0.207488
6 16 501 485 323.242 424 0.125972 0.194559
7 16 621 605 345.621 480 0.194155 0.183123
8 16 730 714 356.903 436 0.086678 0.176099
9 16 814 798 354.572 336 0.081567 0.174786
10 16 832 816 326.313 72 0.037431 0.182355
11 16 833 817 297.013 4 0.533326 0.182784
Total time run: 11.489068
Total writes made: 833
Write size: 4194304
Bandwidth (MB/sec): 290.015

Stddev Bandwidth: 175.723
Max bandwidth (MB/sec): 480
Min bandwidth (MB/sec): 0
Average Latency: 0.220582
Stddev Latency: 0.343697
Max latency: 2.85104
Min latency: 0.035381

Our ultimate aim is to replace existing SAN with ceph,but for that it
should meet minimum 8000 iops.Can any one help me with this,OSD are
SSD,CPU has good clock speed,backend network is good but still we are
not able to extract full capability of SSD disks.



Thanks,
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux