Slow IOPS on RBD compared to journal and backing devices

chibi@xxxxxxx (Christian Balzer) · Fri, 9 May 2014 00:34:29 +0900

Hello,

On Thu, 08 May 2014 17:20:59 +0200 Udo Lembke wrote:

> Hi,
> I think not that's related, but how full is your ceph-cluster? Perhaps
> it's has something to do with the fragmentation on the xfs-filesystem
> (xfs_db -c frag -r device)?
>
As I wrote, this cluster will go into production next week, so it's
neither full nor fragmented. 
I'd also think any severe fragmentation would show up in high device
utilization, something I stated that's not present.

In fact after all the initial testing I did defrag the OSDs a few days ago,
not that they actually needed it.
Because for starters it is ext4, not xfs, see:
https://www.mail-archive.com/ceph-users at lists.ceph.com/msg08619.html

For what it's worth, I never got an answer to the actual question in that
mail.

Christian

> Udo
> 
> Am 08.05.2014 02:57, schrieb Christian Balzer:
> > 
> > Hello,
> > 
> > ceph 0.72 on Debian Jessie, 2 storage nodes with 2 OSDs each. The
> > journals are on (separate) DC 3700s, the actual OSDs are RAID6 behind
> > an Areca 1882 with 4GB of cache.
> > 
> > Running this fio:
> > 
> > fio --size=400m --ioengine=libaio --invalidate=1 --direct=1
> > --numjobs=1 --rw=randwrite --name=fiojob --blocksize=4k --iodepth=128
> > 
> > results in:
> > 
> >   30k  IOPS on the journal SSD (as expected)
> >  110k  IOPS on the OSD (it fits neatly into the cache, no surprise
> > there) 3200   IOPS from a VM using userspace RBD
> > 2900   IOPS from a host kernelspace mounted RBD
> > 
> > When running the fio from the VM RBD the utilization of the journals is
> > about 20% (2400 IOPS) and the OSDs are bored at 2% (1500 IOPS after
> > some obvious merging).
> > The OSD processes are quite busy, reading well over 200% on atop, but
> > the system is not CPU or otherwise resource starved at that moment.
> > 
> > Running multiple instances of this test from several VMs on different
> > hosts changes nothing, as in the aggregated IOPS for the whole cluster
> > will still be around 3200 IOPS.
> > 
> > Now clearly RBD has to deal with latency here, but the network is IPoIB
> > with the associated low latency and the journal SSDs are the
> > (consistently) fasted ones around. 
> > 
> > I guess what I am wondering about is if this is normal and to be
> > expected or if not where all that potential performance got lost.
> > 
> > Regards,
> > 
> > Christian
> > 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/