Hi all, I’ve been experiencing weird performance behavior when using FIO RBD engine directly to an RBD volume with numjobs > 1. For a 4KB random write test at 32 QD and 1 numjob, I can get about 40K IOPS, but when I increase the numjobs to 4, it
plummets to 2800 IOPS. I tried running the same exact test on a VM using FIO libaio targeting a block device (volume) attached through QEMU/RBD and I get ~35K-40K IOPS in both situations. In all cases, CPU was not fully utilized and there were no signs of
any hardware bottlenecks. I did not disable any RBD features and most of the Ceph parameters are default (besides auth, debug, pool size, etc). My Ceph cluster is running on 6 nodes, all-NVMe, 22-core, 376GB mem, Luminous 12.2.1, Ubuntu 16.04, and clients running FIO job/VM on similar HW/SW spec. The VM has 16 vCPU, 64GB mem, and the root disk is locally stored while the persistent
disk comes from an RBD volume serviced by the Ceph cluster. If anyone has seen this issue or have any suggestions please let me know. Thanks, Orlando |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com