The C librbd/librados APIs currently need to copy provided user buffer. There is a goal to remove this unnecessary copy once the underlying issue that necessitates the copy is addressed, but in the meantime, this CPU flame graphs will highlight that copy as a major consumer of the CPU for larger IOs [1]. There is also a lot of additional memory allocation and lock contention occurring in the user-space libraries that will also impact CPU and wall-clock time usage. On Mon, Apr 1, 2019 at 2:08 PM <vitalif@xxxxxxxxxx> wrote: > > Hello, > > I've recently benchmarked random writes into the same RBD image in a > full-flash cluster using `fio -ioengine=librbd` and `fio > -ioengine=libaio` into krbd-mapped /dev/rbd0. > > The result was that with iodepth=1 librbd gave ~0.86ms latency and krbd > gave ~0.74ms; with iodepth=128 librbd gave ~9900 iops and krbd gave > ~17000 iops. That's a huge difference, it basically means a lot of > performance is wasted on the client side. > > Also it seems the performance impact does not come from librbd, it comes > directly from librados, because our ceph-bench / ceph-gobench tools give > almost identical write IOPS as librbd. > > My question is: could anyone make a guess about the thing that's > consuming so much CPU time in librados compared to the kernel rados > client? > > I tried to profile it with valgrind, from valgrind profiles it seems > it's mostly ceph::buffer::list::append and friends. Could it be the > right thing? > > -- > With best regards, > Vitaliy Filippov [1] https://github.com/ceph/ceph/pull/25689#issuecomment-472271162 -- Jason