I'm creating a benchmark suite for Сeph.
During benchmarking of benchmark, I've checked how fast ceph-osd works.
I decided to skip all 'SSD mess' and use brd (block ram disk, modprobe
brd) as underlying storage. Brd itself can yield up to 2.7Mpps in fio.
In single thread mode (iodepth=1) it can yield up to 750k IOPS. LVM over
brd gives about 600kIOPS in single-threaded mode with iodepth=1 (16us
latency).
But, as soon as I put ceph-osd (bluestore) on it, I see something very
odd. No matter how much parallel load I push onto this OSD, it never
gives more than 30 kIOPS, and I can't understand where bottleneck is.
CPU utilization: ~300%. There are 8 cores on my setup, so, CPU is not a
bottleneck.
Network: I've moved benchmark on the same host as OSD, so it's a
localhost. Even counting network, it's still far away from saturation.
30kIOPS (4k) is about 1Gb/s, but I have 10G links. Anyway, tests are run
on localhost, so network is irrelevant (I've checked it, traffic is on
localhost). Test itself consumes about 70% CPU of one core, so there are
plenty left.
Replication: I've killed it (size=1, single osd in the pool).
single-threaded latency: 200us, 4.8kIOPS.
iopdeth=32: 2ms (15kIOPS).
iodepth=16,numjobs=8: 5ms (24k IOPS)
I'm running fio with 'rados' ioengine, and it looks like putting more
workers doesn't change much, so it's not rados ioengine.
As there is plenty CPU and IO left, there is only one possible place for
bottleneck: some time-consuming single-threaded code in ceph-osd.
Are there any knobs to tweak to see higher performance for ceph-osd? I'm
pretty sure it's not any kind of leveling, GC or other 'iops-related'
issues (brd has performance of two order of magnitude higher).
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx