[no subject]

**Date** **Thread**

> 
> Aslo, I known that direct ios can be quite slow with ceph,
> 
> maybe can you try without --direct=1 
> 
I can, but that is not the test case here. 
For the record that pushes it to 12k IOPS, with the journal SSDs reaching
about 30% utilization and the actual OSDs up to 5%. 
So much better, but still quite some capacity for improvement.

> and also enable rbd_cache
> 
> ceph.conf
> [client]
> rbd cache = true
> 
I have that set of course, as well as specifically "writeback" for the KVM
instance in question.

Interestingly I see no difference at all with a KVM instance that is set
explicitly to "none", but that's not part of this particular inquiry
either.

Christian
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Christian Balzer" <chibi at gol.com> 
> ?: "Gregory Farnum" <greg at inktank.com>, ceph-users at lists.ceph.com 
> Envoy?: Jeudi 8 Mai 2014 04:49:16 
> Objet: Re: Slow IOPS on RBD compared to journal and backing
> devices 
> 
> On Wed, 7 May 2014 18:37:48 -0700 Gregory Farnum wrote: 
> 
> > On Wed, May 7, 2014 at 5:57 PM, Christian Balzer <chibi at gol.com>
> > wrote: 
> > > 
> > > Hello, 
> > > 
> > > ceph 0.72 on Debian Jessie, 2 storage nodes with 2 OSDs each. The 
> > > journals are on (separate) DC 3700s, the actual OSDs are RAID6
> > > behind an Areca 1882 with 4GB of cache. 
> > > 
> > > Running this fio: 
> > > 
> > > fio --size=400m --ioengine=libaio --invalidate=1 --direct=1 
> > > --numjobs=1 --rw=randwrite --name=fiojob --blocksize=4k
> > > --iodepth=128 
> > > 
> > > results in: 
> > > 
> > > 30k IOPS on the journal SSD (as expected) 
> > > 110k IOPS on the OSD (it fits neatly into the cache, no surprise 
> > > there) 3200 IOPS from a VM using userspace RBD 
> > > 2900 IOPS from a host kernelspace mounted RBD 
> > > 
> > > When running the fio from the VM RBD the utilization of the journals
> > > is about 20% (2400 IOPS) and the OSDs are bored at 2% (1500 IOPS
> > > after some obvious merging). 
> > > The OSD processes are quite busy, reading well over 200% on atop,
> > > but the system is not CPU or otherwise resource starved at that
> > > moment. 
> > > 
> > > Running multiple instances of this test from several VMs on
> > > different hosts changes nothing, as in the aggregated IOPS for the
> > > whole cluster will still be around 3200 IOPS. 
> > > 
> > > Now clearly RBD has to deal with latency here, but the network is
> > > IPoIB with the associated low latency and the journal SSDs are the 
> > > (consistently) fasted ones around. 
> > > 
> > > I guess what I am wondering about is if this is normal and to be 
> > > expected or if not where all that potential performance got lost. 
> > 
> > Hmm, with 128 IOs at a time (I believe I'm reading that correctly?) 
> Yes, but going down to 32 doesn't change things one iota. 
> Also note the multiple instances I mention up there, so that would be
> 256 IOs at a time, coming from different hosts over different links and 
> nothing changes. 
> 
> > that's about 40ms of latency per op (for userspace RBD), which seems 
> > awfully long. You should check what your client-side objecter settings 
> > are; it might be limiting you to fewer outstanding ops than that. 
> 
> Googling for client-side objecter gives a few hits on ceph devel and
> bugs and nothing at all as far as configuration options are concerned. 
> Care to enlighten me where one can find those? 
> 
> Also note the kernelspace (3.13 if it matters) speed, which is very much 
> in the same (junior league) ballpark. 
> 
> > If 
> > it's available to you, testing with Firefly or even master would be 
> > interesting ? there's some performance work that should reduce 
> > latencies. 
> > 
> Not an option, this is going into production next week. 
> 
> > But a well-tuned (or even default-tuned, I thought) Ceph cluster 
> > certainly doesn't require 40ms/op, so you should probably run a wider 
> > array of experiments to try and figure out where it's coming from. 
> 
> I think we can rule out the network, NPtcp gives me: 
> --- 
> 56: 4096 bytes 1546 times --> 979.22 Mbps in 31.91 usec 
> --- 
> 
> For comparison at about 512KB it reaches maximum throughput and still 
> isn't that laggy: 
> --- 
> 98: 524288 bytes 121 times --> 9700.57 Mbps in 412.35 usec 
> --- 
> 
> So with the network performing as well as my lengthy experience with
> IPoIB led me to believe, what else is there to look at? 
> The storage nodes perform just as expected, indicated by the local fio 
> tests. 
> 
> That pretty much leaves only Ceph/RBD to look at and I'm not really sure 
> what experiments I should run on that. ^o^ 
> 
> Regards, 
> 
> Christian 
> 
> > -Greg 
> > Software Engineer #42 @ http://inktank.com | http://ceph.com 
> > 
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/