Re: librados (librbd) slower than krbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 4/2/19 10:04 AM, vitalif@xxxxxxxxxx wrote:
Just to be clear, did you run with "direct=1" on your fio tests? I
would also recommend disabling the librbd in-memory cache for random
IO tests against fast storage (rbd cache = false).

Yes, of course, I ran it with -direct=1 -sync=1.

Thanks for the hint, I retested it without the cache. The numbers are closer now. 1456 iops with Q=1 against a single NVMe OSD and 17470 iops with Q=128 against the whole cluster (with cache it's only 1350 and 9950 iops, respectively). There's definitely something wrong with this cache...


You may be interested in the PR/discussion regarding multiple writeback threads:


https://github.com/ceph/ceph/pull/25713

and also the work Jason has been doing:

https://github.com/ceph/ceph/pull/26675
https://github.com/ceph/ceph/pull/27229



Kernel client gives 1550 iops with Q=1 (single OSD) and 23530 iops (whole cluster) with Q=128. I'm running it in a cluster currently in use so the load varies, but krbd still seems faster.


A month or two ago I made some changes to cbt to add the ability to benchmark different "client endpoints" with the same benchmarking code. The test case I was using when developing it was fio with a bunch of different client endpoints under high concurrency (so no QD=1 results, but that would be interesting too). I also included CPU usage numbers as well. You can see the results here:

https://docs.google.com/spreadsheets/d/1oJZ036QDbJQgv2gXts1oKKhMOKXrOI2XLTkvlsl9bUs/edit#gid=1240030838



Semi-recent performance tests against the librbd vs krbd against a
single NVMe-backed OSD showed 4K random write performance at around
75K MiBs/s for krbd and 90 MiBs/s for librbd -- but at the expense of
nearly 4x more client-side CPU for librbd. krbd pre-allocates a lot of
its up-front memory in slab allocation pools and zero-copies as much
as possible. librbd/librados are heavily hit w/ numerous C++ small
object heap allocations and related initialize/copy operations.

I can't achieve such numbers even with the rbd cache disabled. It only gives me ~9500 iops, same for librbd and krbd when I'm testing it against a single NVMe OSD. Probably the CPU is not that great (~8 years old 2.2 ghz xeon), the benchmarked OSD eats ~650% CPU during testing.

Also I have message signatures disabled (cephx_sign_messages = false). Without disabling them it was even worse...


Another big CPU drain is debug ms = 1. We recently decided to disable it by default in master since the overhead is so high. You can see that PR here:

https://github.com/ceph/ceph/pull/26936

and the associated performance data:

https://docs.google.com/spreadsheets/d/1Zi3MFtvwLzCFfObL6evQKYtINQVQIjZ0SXczG78AnJM/edit?usp=sharing

Mark



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux