Re: [Cbt] Poor libRBD write performance

Mark Nelson <mnelson@xxxxxxxxxx> · Mon, 20 Nov 2017 10:16:12 -0600

On 11/20/2017 10:06 AM, Moreno, Orlando wrote:
Hi all,

I’ve been experiencing weird performance behavior when using FIO RBD
engine directly to an RBD volume with numjobs > 1. For a 4KB random
write test at 32 QD and 1 numjob, I can get about 40K IOPS, but when I
increase the numjobs to 4, it plummets to 2800 IOPS. I tried running the
same exact test on a VM using FIO libaio targeting a block device
(volume) attached through QEMU/RBD and I get ~35K-40K IOPS in both
situations. In all cases, CPU was not fully utilized and there were no
signs of any hardware bottlenecks. I did not disable any RBD features
and most of the Ceph parameters are default (besides auth, debug, pool
size, etc).

My Ceph cluster is running on 6 nodes, all-NVMe, 22-core, 376GB mem,
Luminous 12.2.1, Ubuntu 16.04, and clients running FIO job/VM on similar
HW/SW spec. The VM has 16 vCPU, 64GB mem, and the root disk is locally
stored while the persistent disk comes from an RBD volume serviced by
the Ceph cluster.

If anyone has seen this issue or have any suggestions please let me know.

Hi Orlando,

Try seeing if disabling the RBD image exclusive lock helps (if only to 
confirm that's what's going on).  I usually test with numjobs=1 and run 
multiple fio instances with higher iodepth values instead to avoid this. 
 See:

https://www.spinics.net/lists/ceph-devel/msg30468.html

and

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004872.html

Mark

Thanks,

Orlando

_______________________________________________
Cbt mailing list
Cbt@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/cbt-ceph.com

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html