Hello,
I've recently benchmarked random writes into the same RBD image in a
full-flash cluster using `fio -ioengine=librbd` and `fio
-ioengine=libaio` into krbd-mapped /dev/rbd0.
The result was that with iodepth=1 librbd gave ~0.86ms latency and krbd
gave ~0.74ms; with iodepth=128 librbd gave ~9900 iops and krbd gave
~17000 iops. That's a huge difference, it basically means a lot of
performance is wasted on the client side.
Also it seems the performance impact does not come from librbd, it comes
directly from librados, because our ceph-bench / ceph-gobench tools give
almost identical write IOPS as librbd.
My question is: could anyone make a guess about the thing that's
consuming so much CPU time in librados compared to the kernel rados
client?
I tried to profile it with valgrind, from valgrind profiles it seems
it's mostly ceph::buffer::list::append and friends. Could it be the
right thing?
--
With best regards,
Vitaliy Filippov