On 10/09/2020 19:37, Mark Nelson wrote:
On 9/10/20 11:03 AM, George Shuklin wrote:
...
Are there any knobs to tweak to see higher performance for ceph-osd?
I'm pretty sure it's not any kind of leveling, GC or other
'iops-related' issues (brd has performance of two order of magnitude
higher).
So as you've seen, Ceph does a lot more than just write a chunk of
data out to a block on disk. There's tons of encoding/decoding
happening, crc checksums, crush calculations, onode lookups,
write-ahead-logging, and other work involved that all adds latency.
You can overcome some of that through parallelism, but 30K IOPs per
OSD is probably pretty on-point for a nautilus era OSD. For octopus+
the cache refactor in bluestore should get you farther (40-50k+ for
and OSD in isolation). The maximum performance we've seen in-house is
around 70-80K IOPs on a single OSD using very fast NVMe and highly
tuned settings.
A couple of things you can try:
- upgrade to octopus+ for the cache refactor
- Make sure you are using the equivalent of the latency-performance or
latency-network tuned profile. The most important part is disabling
CPU cstate transitions.
- increase osd_memory_target if you have a larger dataset (onode cache
misses in bluestore add a lot of latency)
- enable turbo if it's disabled (higher clock speed generally helps)
On the write path you are correct that there is a limitation regarding
a single kv sync thread. Over the years we've made this less of a
bottleneck but it's possible you still could be hitting it. In our
test lab we've managed to utilize up to around 12-14 cores on a single
OSD in isolation with 16 tp_osd_tp worker threads and on a larger
cluster about 6-7 cores per OSD. There's probably multiple factors at
play, including context switching, cache thrashing, memory throughput,
object creation/destruction, etc. If you decide to look into it
further you may want to try wallclock profiling the OSD under load and
seeing where it is spending its time.
Thank you for feedback.
I forgot to mention this, it's Octopus, fresh installation.
I've disabled CSTATE (governor=performance), it make no difference -
same iops, same CPU use by ceph-osd I've just can't force Ceph to
consume more than 330% of CPU. I can force read up to 150k IOPS (both
network and local), hitting CPU limit, but write is somewhat restricted
by ceph itself.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx