Bad Write-Performance on Ceph/Possible bottlenecks?

konrad.gutkowski@xxxxxx (Konrad Gutkowski) · Fri, 04 Jul 2014 11:37:09 +0200

Hi,

I wouldn't put those SSD's in raid, just use them separately as journals  
for half of your's HDD's. This should make your write performance somewhat  
better.

W dniu 04.07.2014 o 11:13 Marco Allevato <m.allevato at nwe.de> pisze:

>
> Hello Ceph-Community,
>
>
> I?m writing here because we have a bad write-performance on our  
> Ceph-Cluster of about
>
> As an overview the technical details of our Cluster:
>
>
> 3 x monitoring-Servers; each with 2 x 1 Gbit/s NIC configured as Bond  
> (Link Aggregation-Mode)
>
>
> 5 x datastore-Servers; each with 10 x 4 TB HDDs serving as OSDs, as  
> Journal we use a 15 GB LVM on an 256 GB SSD-Raid1; 2 x 10 Gbit/s NIC  
> configured as Bond (Link Aggregation->Mode)
>
>
> ceph.conf
>
>
> [global]
>
> auth_service_required = cephx
>
> filestore_xattr_use_omap = true
>
> auth_client_required = cephx
>
> auth_cluster_required = cephx
>
> mon_host = 172.30.30.8,172.30.30.9
>
> mon_initial_members = monitoring1, monitoring2, monitoring3
>
> fsid = 5f22ab94-8d96-48c2-88d3-cff7bad443a9
>
> public network = 172.30.30.0/24
>
> [mon.monitoring1]
>
>        host = monitoring1
>
>        addr = 172.30.30.8:6789
>
>
> [mon.monitoring2]
>
>        host = monitoring2
>
>        addr = 172.30.30.9:6789
>
>
> [mon.monitoring3]
>
>        host = monitoring3
>
>        addr = 172.30.30.10:6789
>
>
> [filestore]
>
>       filestore max sync interval = 10
>
>
> [osd]
>
>        osd recovery max active = 1
>
>        osd journal size = 15360
>
>        osd op threads = 40
>
>        osd disk threads = 40
>
>
> [osd.0]
>
>        host = datastore1
>
>
> [osd.1]
>
>        host = datastore1
>
>
> [osd.2]
>
>        host = datastore1
>
>
> [osd.3]
>
>        host = datastore1
>
>
> [osd.4]
>
>        host = datastore1
>
>
> [osd.5]
>
>        host = datastore1
>
>
> [osd.6]
>
>        host = datastore1
>
>
> [osd.7]
>
>        host = datastore1
>
>
> [osd.8]
>
>        host = datastore1
>
>
> [osd.9]
>
>        host = datastore1
>
>
> [osd.10]
>
>        host = datastore2
>
>
> [osd.11]
>
>        host = datastore2
>
>
> [osd.11]
>
>        host = datastore2
>
>
> [osd.12]
>
>        host = datastore2
>
>
> [osd.13]
>
>        host = datastore2
>
>
> [osd.14]
>
>        host = datastore2
>
>
> [osd.15]
>
>        host = datastore2
>
>
> [osd.16]
>
>        host = datastore2
>
>
> [osd.17]
>
>        host = datastore2
>
>
> [osd.18]
>
>        host = datastore2
>
>
> [osd.19]
>
>        host = datastore2
>
>
> [osd.20]
>
>        host = datastore3
>
>
> [osd.21]
>
>        host = datastore3
>
>
> [osd.22]
>
>        host = datastore3
>
>
> [osd.23]
>
>        host = datastore3
>
>
> [osd.24]
>
>        host = datastore3
>
>
> [osd.25]
>
>        host = datastore3
>
>
> [osd.26]
>
>        host = datastore3
>
>
> [osd.27]
>
>        host = datastore3
>
>
> [osd.28]
>
>        host = datastore3
>
>
> [osd.29]
>
>        host = datastore3
>
>
> [osd.30]
>
>        host = datastore4
>
>
> [osd.31]
>
>        host = datastore4
>
>
> [osd.32]
>
>        host = datastore4
>
>
> [osd.33]
>
>        host = datastore4
>
>
> [osd.34]
>
>        host = datastore4
>
>
> [osd.35]
>
>        host = datastore4
>
>
> [osd.36]
>
>        host = datastore4
>
>
> [osd.37]
>
>        host = datastore4
>
>
> [osd.38]
>
>        host = datastore4
>
>
> [osd.39]
>
>        host = datastore4
>
>
> [osd.0]
>
>        host = datastore5
>
>
> [osd.40]
>
>        host = datastore5
>
>
> [osd.41]
>
>        host = datastore5
>
>
> [osd.42]
>
>        host = datastore5
>
>
> [osd.43]
>
>        host = datastore5
>
>
> [osd.44]
>
>        host = datastore5
>
>
> [osd.45]
>
>        host = datastore5
>
>
> [osd.46]
>
>        host = datastore5
>
>
> [osd.47]
>
>        host = datastore5
>
>
> [osd.48]
>
>        host = datastore5
>
>
>
> We have 3 pools:
>
> -> 2 x 1000 pgs with 2 Replicas distributing the data equally to two  
> racks (Used for datastore 1-4)
>
> -> 1 x 100 pgs without replication; data only stored on datastore 5.  
> This Pool is used to compare the performance on local disks without  
> networking
>
>
>
> Here are the performance values, which I get using fio-Bench on a 32GB  
> rbd:
>
>
>
> On 1000 pgs-Pool with distribution
>
>
> fio --bs=1M --rw=randwrite --ioengine=libaio --direct=1 --iodepth=32  
> --runtime=60 --name=/dev/rbd/pool1/bench1
>
>
> fio-2.0.13
>
> Starting 1 process
>
> Jobs: 1 (f=1): [w] [100.0% done] [0K/312.0M/0K /s] [0 /312 /0  iops]  
> [eta 00m:00s]
>
> /dev/rbd/pool1/bench1: (groupid=0, jobs=1): err= 0: pid=21675: Fri Jul   
> 4 11:03:52 2014
>
>  write: io=21071MB, bw=358989KB/s, iops=350 , runt= 60104msec
>
>    slat (usec): min=127 , max=8040 , avg=511.49, stdev=216.27
>
>    clat (msec): min=5 , max=4018 , avg=90.74, stdev=215.83
>
>     lat (msec): min=6 , max=4018 , avg=91.25, stdev=215.83
>
>    clat percentiles (msec):
>
>     |  1.00th=[    8],  5.00th=[    9], 10.00th=[   11], 20.00th=[   15],
>
>     | 30.00th=[   21], 40.00th=[   30], 50.00th=[   45], 60.00th=[   63],
>
>     | 70.00th=[   83], 80.00th=[  105], 90.00th=[  129], 95.00th=[  190],
>
>     | 99.00th=[ 1254], 99.50th=[ 1680], 99.90th=[ 2409], 99.95th=[ 2638],
>
>     | 99.99th=[ 3556]
>
>    bw (KB/s)  : min=68210, max=479232, per=100.00%, avg=368399.55,  
> stdev=84457.12
>
>    lat (msec) : 10=9.50%, 20=20.02%, 50=23.56%, 100=24.56%, 250=18.09%
>
>    lat (msec) : 500=1.39%, 750=0.81%, 1000=0.65%, 2000=1.13%,  
> >=2000=0.29%
>
>  cpu          : usr=11.17%, sys=7.46%, ctx=17772, majf=0, minf=24
>
>  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%,  
> >=64=0.0%
>
>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,  
> >=64=0.0%
>
>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,  
> >=64=0.0%
>
>     issued    : total=r=0/w=21071/d=0, short=r=0/w=0/d=0
>
>
> Run status group 0 (all jobs):
>
>  WRITE: io=21071MB, aggrb=358989KB/s, minb=358989KB/s, maxb=358989KB/s,  
> mint=60104msec, maxt=60104msec
>
>
>
> On 100 pgs-Pool without distribution:
>
>
> WRITE: io=5884.0MB, aggrb=297953KB/s, minb=297953KB/s, maxb=297953KB/s,  
> mint=20222msec, maxt=20222msec
>
>
>
> Do you have any suggestion on how to improve the performace?
>
> While Reading on the internet, typical write-rates should be around  
> 800-1000 Mb/sec if using 10 Gbit/s-Connection with a similar setup.
>
>
>
> Thanks in advance
>
>
> --
>
> Marco Allevato
> Projektteam
>
>
> Network Engineering GmbH
> Maximilianstrasse 93
> D-67346 Speyer
>
>
>

-- 

Konrad Gutkowski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140704/e4b12c97/attachment.htm>