Slow IOPS on RBD compared to journal and backing devices

ulembke@xxxxxxxxxxxx (Udo Lembke) · Thu, 08 May 2014 17:20:59 +0200

Hi,
I think not that's related, but how full is your ceph-cluster? Perhaps
it's has something to do with the fragmentation on the xfs-filesystem
(xfs_db -c frag -r device)?

Udo

Am 08.05.2014 02:57, schrieb Christian Balzer:
> 
> Hello,
> 
> ceph 0.72 on Debian Jessie, 2 storage nodes with 2 OSDs each. The journals
> are on (separate) DC 3700s, the actual OSDs are RAID6 behind an Areca 1882
> with 4GB of cache.
> 
> Running this fio:
> 
> fio --size=400m --ioengine=libaio --invalidate=1 --direct=1 --numjobs=1 --rw=randwrite --name=fiojob --blocksize=4k --iodepth=128
> 
> results in:
> 
>   30k  IOPS on the journal SSD (as expected)
>  110k  IOPS on the OSD (it fits neatly into the cache, no surprise there)
> 3200   IOPS from a VM using userspace RBD
> 2900   IOPS from a host kernelspace mounted RBD
> 
> When running the fio from the VM RBD the utilization of the journals is
> about 20% (2400 IOPS) and the OSDs are bored at 2% (1500 IOPS after some
> obvious merging).
> The OSD processes are quite busy, reading well over 200% on atop, but
> the system is not CPU or otherwise resource starved at that moment.
> 
> Running multiple instances of this test from several VMs on different hosts
> changes nothing, as in the aggregated IOPS for the whole cluster will
> still be around 3200 IOPS.
> 
> Now clearly RBD has to deal with latency here, but the network is IPoIB
> with the associated low latency and the journal SSDs are the
> (consistently) fasted ones around. 
> 
> I guess what I am wondering about is if this is normal and to be expected
> or if not where all that potential performance got lost.
> 
> Regards,
> 
> Christian
>