Write IO Problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

We have a huge write IO Problem in our preproductive Ceph Cluster. First our Hardware:

 

4 OSD Nodes with:

 

Supermicro X10 Board

32GB DDR4 RAM

2x Intel Xeon E5-2620

LSI SAS 9300-8i Host Bus Adapter

Intel Corporation 82599EB 10-Gigabit

2x Intel SSDSA2CT040G3 in software raid 1 for system

 

Disks:

2x Samsung EVO 840 1TB

 

So comulated 8 SSDs as OSD, with btrfs formatted (with ceph-disk, only added nodiratime)

 

Benchmarking one disk alone gives good values:

 

dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc

1073741824 Bytes (1,1 GB) kopiert, 2,53986 s, 423 MB/s

 

Fio 8k libaio depth=32:

write: io=488184KB, bw=52782KB/s, iops=5068 , runt=  9249msec

 

Here our ceph.conf (pretty much standard):

 

[global]

fsid = 89191a54-740a-46c7-a325-0899ab32fd1d

mon initial members = cephasp41,ceph-monitor41

mon host = 172.30.10.15,172.30.10.19

public network = 172.30.10.0/24

cluster network = 172.30.10.0/24

auth cluster required = cephx

auth service required = cephx

auth client required = cephx

 

#Default is 1GB, which is fine for us

#osd journal size = {n}

 

#Only needed if ext4 comes to play

#filestore xattr use omap = true

 

osd pool default size = 3  # Write an object n times.

osd pool default min size = 2 # Allow writing n copy in a degraded state.

 

#Set individual per pool by a formula

#osd pool default pg num = {n}

#osd pool default pgp num = {n}

#osd crush chooseleaf type = {n}

 

 

When I benchmark the cluster with “rbd bench-write rbd/fio” I get pretty good results:

elapsed:    18  ops:   262144  ops/sec: 14466.30  bytes/sec: 59253946.11

 

If I for example bench i.e. with fio with rbd engine, I get very poor results:

 

[global]

ioengine=rbd

clientname=admin

pool=rbd

rbdname=fio

invalidate=0    # mandatory

rw=randwrite

bs=512k

 

[rbd_iodepth32]

iodepth=32

 

RESULTS:

ite: io=2048.0MB, bw=53896KB/s, iops=105, runt= 38911msec

 

Also if I mount the rbd with kernel as rbd0, format it with ext4 and then do a dd on it, its not that good:

“dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc”

RESULT:

1073741824 Bytes (1,1 GB) kopiert, 12,6152 s, 85,1 MB/s

 

I also tried presenting an rbd image with tgtd, mount it onto VMWare ESXi and test it in a vm, there I got only round about 50 iops with 4k, and writing sequentiell 25Mbytes.

With NFS the read sequential values are good (400Mbyte/s) but writing only 25Mbyte/s.

 

What I tried tweaking so far:

 

Intel NIC optimazitions:

/etc/sysctl.conf

 

# Increase system file descriptor limit

fs.file-max = 65535

 

# Increase system IP port range to allow for more concurrent connections

net.ipv4.ip_local_port_range = 1024 65000

 

# -- 10gbe tuning from Intel ixgb driver README -- #

 

# turn off selective ACK and timestamps

net.ipv4.tcp_sack = 0

net.ipv4.tcp_timestamps = 0

 

# memory allocation min/pressure/max.

# read buffer, write buffer, and buffer space

net.ipv4.tcp_rmem = 10000000 10000000 10000000

net.ipv4.tcp_wmem = 10000000 10000000 10000000

net.ipv4.tcp_mem = 10000000 10000000 10000000

 

net.core.rmem_max = 524287

net.core.wmem_max = 524287

net.core.rmem_default = 524287

net.core.wmem_default = 524287

net.core.optmem_max = 524287

net.core.netdev_max_backlog = 300000

 

AND

 

setpci -v -d 8086:10fb e6.b=2e

 

 

Setting tunables to firefly:

            ceph osd crush tunables firefly

 

Setting scheduler to noop:

            This basically stopped IO on the cluster, and I had to revert it and restart some of the osds with requests stuck

 

And I tried moving the monitor from an VM to the Hardware where the OSDs run.

 

 

Any suggestions where to look, or what could cause that problem?

(because I can’t believe your loosing that much performance through ceph replication)

 

Thanks in advance.

 

If you need any info please tell me.

 

Mit freundlichen Grüßen/Kind regards  

Jonas Rottmann
Systems Engineer

FIS-ASP Application Service Providing und
IT-Outsourcing GmbH
Röthleiner Weg 4
D-97506 Grafenrheinfeld 
Phone: +49 (9723) 9188-568
Fax: +49 (9723) 9188-600

email: j.rottmann@xxxxxxxxxx  web: www.fis-asp.de

Geschäftsführer Robert Schuhmann
Registergericht Schweinfurt HRB 3865

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux