On 07/04/2014 11:33 AM, Daniel Schwager wrote: > Hi, > > I think, the problem is the rbd device. It's only ONE device. I fully agree. Ceph excels in parallel performance. You should run multiple fio instances in parallel on different RBD devices and even better on different clients. Then you will see a big difference. Wido > >>fio --bs=1M --rw=randwrite --ioengine=libaio --direct=1 --iodepth=32 > --runtime=60 --name=/dev/rbd/pool1/bench1 > > Try to create e.g. 20 (small) rbd devices, putting them all in a lvm vg, > creating a logical volume (Raid0) with > > 20 stripes and e.g. stripeSize 1MB (better bandwith) or 4kb (better io) > - or use md-raid0 (it's maybe 10% faster - but not that flexible): > > # create disks > > for i in `seq -f "%02.f" 0 19` ; do rbd create --size 40000 > vmware/vol6-$i.dsk ; done > > emacs -nw /etc/lvm/lvm.conf > > types = [ "rbd", 16 ] > > # rbd map .... > > # pvcreate > > for i in `seq -f "%02.f" 0 19` ; do pvcreate /dev/rbd/vmware/vol6-$i.dsk > ; done > > # vcreate VG > > vgcreate VG_RBD20x40_VOL6 /dev/rbd/vmware/vol6-00.dsk > > for i in `seq -f "%02.f" 1 19` ; do vgextend VG_RBD20x40_VOL6 > /dev/rbd/vmware/vol6-$i.dsk ; done > > # lvcreate raid0 > > # -i, --stripes Stripes - This is equal to the number of physical > volumes to scatter the logical volume. > > # -I, --stripesize StripeSize - Gives the number of kilobytes for the > granularity of the stripes, 2^n, (n = 2 to 9) > > # 20 stripes und 4k StripeSize > > lvcreate -i20 -I1024 -L700000m-n VmProd06VG_RBD20x40_VOL6 > > Now, try to run fio against /dev/mapper/ VG_RBD20x40_VOL6-VmProd06 > > I think, the performance will be about 10GBi. > > regards > > Danny > > *From:*ceph-users [mailto:ceph-users-bounces at lists.ceph.com] *On Behalf > Of *Marco Allevato > *Sent:* Friday, July 04, 2014 11:13 AM > *To:* ceph-users at lists.ceph.com > *Subject:* [ceph-users] Bad Write-Performance on Ceph/Possible bottlenecks? > > Hello Ceph-Community, > > I?m writing here because we have a bad write-performance on our > Ceph-Cluster of about > > _As an overview the technical details of our Cluster:_ > > 3 x monitoring-Servers; each with 2 x 1 Gbit/s NIC configured as Bond > (Link Aggregation-Mode) > > 5 x datastore-Servers; each with 10 x 4 TB HDDs serving as OSDs, as > Journal we use a 15 GB LVM on an 256 GB SSD-Raid1; 2 x 10 Gbit/s NIC > configured as Bond (Link Aggregation-Mode) > > __ > > _ceph.conf_ > > [global] > > auth_service_required = cephx > > filestore_xattr_use_omap = true > > auth_client_required = cephx > > auth_cluster_required = cephx > > mon_host = 172.30.30.8,172.30.30.9 > > mon_initial_members = monitoring1, monitoring2, monitoring3 > > fsid = 5f22ab94-8d96-48c2-88d3-cff7bad443a9 > > public network = 172.30.30.0/24 > > [mon.monitoring1] > > host = monitoring1 > > addr = 172.30.30.8:6789 > > [mon.monitoring2] > > host = monitoring2 > > addr = 172.30.30.9:6789 > > [mon.monitoring3] > > host = monitoring3 > > addr = 172.30.30.10:6789 > > [filestore] > > filestore max sync interval = 10 > > [osd] > > osd recovery max active = 1 > > osd journal size = 15360 > > osd op threads = 40 > > osd disk threads = 40 > > [osd.0] > > host = datastore1 > > [osd.1] > > host = datastore1 > > [osd.2] > > host = datastore1 > > [osd.3] > > host = datastore1 > > [osd.4] > > host = datastore1 > > [osd.5] > > host = datastore1 > > [osd.6] > > host = datastore1 > > [osd.7] > > host = datastore1 > > [osd.8] > > host = datastore1 > > [osd.9] > > host = datastore1 > > [osd.10] > > host = datastore2 > > [osd.11] > > host = datastore2 > > [osd.11] > > host = datastore2 > > [osd.12] > > host = datastore2 > > [osd.13] > > host = datastore2 > > [osd.14] > > host = datastore2 > > [osd.15] > > host = datastore2 > > [osd.16] > > host = datastore2 > > [osd.17] > > host = datastore2 > > [osd.18] > > host = datastore2 > > [osd.19] > > host = datastore2 > > [osd.20] > > host = datastore3 > > [osd.21] > > host = datastore3 > > [osd.22] > > host = datastore3 > > [osd.23] > > host = datastore3 > > [osd.24] > > host = datastore3 > > [osd.25] > > host = datastore3 > > [osd.26] > > host = datastore3 > > [osd.27] > > host = datastore3 > > [osd.28] > > host = datastore3 > > [osd.29] > > host = datastore3 > > [osd.30] > > host = datastore4 > > [osd.31] > > host = datastore4 > > [osd.32] > > host = datastore4 > > [osd.33] > > host = datastore4 > > [osd.34] > > host = datastore4 > > [osd.35] > > host = datastore4 > > [osd.36] > > host = datastore4 > > [osd.37] > > host = datastore4 > > [osd.38] > > host = datastore4 > > [osd.39] > > host = datastore4 > > [osd.0] > > host = datastore5 > > [osd.40] > > host = datastore5 > > [osd.41] > > host = datastore5 > > [osd.42] > > host = datastore5 > > [osd.43] > > host = datastore5 > > [osd.44] > > host = datastore5 > > [osd.45] > > host = datastore5 > > [osd.46] > > host = datastore5 > > [osd.47] > > host = datastore5 > > [osd.48] > > host = datastore5 > > We have 3 pools: > > -> 2 x 1000 pgs with 2 Replicas distributing the data equally to two > racks (Used for datastore 1-4) > > -> 1 x 100 pgs without replication; data only stored on datastore 5. > This Pool is used to compare the performance on local disks without > networking > > Here are the performance values, which I get using fio-Bench on a 32GB rbd: > > __ > > _On 1000 pgs-Pool with distribution_ > > fio --bs=1M --rw=randwrite --ioengine=libaio --direct=1 --iodepth=32 > --runtime=60 --name=/dev/rbd/pool1/bench1 > > fio-2.0.13 > > Starting 1 process > > Jobs: 1 (f=1): [w] [100.0% done] [0K/312.0M/0K /s] [0 /312 /0 iops] > [eta 00m:00s] > > /dev/rbd/pool1/bench1: (groupid=0, jobs=1): err= 0: pid=21675: Fri Jul > 4 11:03:52 2014 > > write: io=21071MB, bw=358989KB/s, iops=350 , runt= 60104msec > > slat (usec): min=127 , max=8040 , avg=511.49, stdev=216.27 > > clat (msec): min=5 , max=4018 , avg=90.74, stdev=215.83 > > lat (msec): min=6 , max=4018 , avg=91.25, stdev=215.83 > > clat percentiles (msec): > > | 1.00th=[ 8], 5.00th=[ 9], 10.00th=[ 11], 20.00th=[ 15], > > | 30.00th=[ 21], 40.00th=[ 30], 50.00th=[ 45], 60.00th=[ 63], > > | 70.00th=[ 83], 80.00th=[ 105], 90.00th=[ 129], 95.00th=[ 190], > > | 99.00th=[ 1254], 99.50th=[ 1680], 99.90th=[ 2409], 99.95th=[ 2638], > > | 99.99th=[ 3556] > > bw (KB/s) : min=68210, max=479232, per=100.00%, avg=368399.55, > stdev=84457.12 > > lat (msec) : 10=9.50%, 20=20.02%, 50=23.56%, 100=24.56%, 250=18.09% > > lat (msec) : 500=1.39%, 750=0.81%, 1000=0.65%, 2000=1.13%, >=2000=0.29% > > cpu : usr=11.17%, sys=7.46%, ctx=17772, majf=0, minf=24 > > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, > >=64=0.0% > > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, > >=64=0.0% > > issued : total=r=0/w=21071/d=0, short=r=0/w=0/d=0 > > Run status group 0 (all jobs): > > WRITE: io=21071MB, *aggrb=358989KB/s, minb=358989KB/s, > maxb=358989KB/s, mint=60104msec, maxt=60104msec* > > ** > > ** > > _On 100 pgs-Pool without distribution:_ > > __ > > WRITE: io=5884.0MB, *aggrb=297953KB/s, minb=297953KB/s, maxb=297953KB/s, > mint=20222msec, maxt=20222msec* > > Do you have any suggestion on how to improve the performace? > > While Reading on the internet, typical write-rates should be around > 800-1000 Mb/sec if using 10 Gbit/s-Connection with a similar setup. > > Thanks in advance > > -- > > Marco Allevato > Projektteam > > Network Engineering GmbH > Maximilianstrasse 93 > D-67346 Speyer > > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on