I suspect you are seeing this issue [1]. TL;DR: never use "numjobs" > 1 against an RBD image that has the exclusive-lock feature enabled. [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-August/012123.html On Mon, Nov 20, 2017 at 11:06 AM, Moreno, Orlando <orlando.moreno@xxxxxxxxx> wrote: > Hi all, > > > > I’ve been experiencing weird performance behavior when using FIO RBD engine > directly to an RBD volume with numjobs > 1. For a 4KB random write test at > 32 QD and 1 numjob, I can get about 40K IOPS, but when I increase the > numjobs to 4, it plummets to 2800 IOPS. I tried running the same exact test > on a VM using FIO libaio targeting a block device (volume) attached through > QEMU/RBD and I get ~35K-40K IOPS in both situations. In all cases, CPU was > not fully utilized and there were no signs of any hardware bottlenecks. I > did not disable any RBD features and most of the Ceph parameters are default > (besides auth, debug, pool size, etc). > > > > My Ceph cluster is running on 6 nodes, all-NVMe, 22-core, 376GB mem, > Luminous 12.2.1, Ubuntu 16.04, and clients running FIO job/VM on similar > HW/SW spec. The VM has 16 vCPU, 64GB mem, and the root disk is locally > stored while the persistent disk comes from an RBD volume serviced by the > Ceph cluster. > > > > If anyone has seen this issue or have any suggestions please let me know. > > > > Thanks, > > Orlando > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com