Hi Stephan,
We recently ran a set of 3-sample tests looking at 2OSD/NVMe vs 1
OSD/NVMe RBD performance on Nautilus, Octopus, and Master on some of our
newer performance nodes with Intel P4510 NVMe drives. Those tests use
the librbd fio backend. We also saw similar randread and seq write
performance increases but did not see a performance regression with 4KB
random writes like you did. In fact Octopus was significantly faster
than Nautilus (but master regressed a little vs octopus). We expect it
to be significantly faster too as we improved the way the bluestore
caches work and it's consistently shown gains for us. Here are the most
recent test results:
https://docs.google.com/spreadsheets/d/1e5eTeHdZnSizoY6AUjH0knb4jTCW7KMU4RoryLX9EHQ/edit?usp=sharing
Having said that, this is the second report I've gotten regarding
performance regression in Octopus so there could be something going on
that we are missing. If possible, could you run gdbpmp against one of
your OSDs during the test? That might help us figure out why it's
slow. Otherwise some other things to look at:
1) If this is a large dataset, see if increasing the osd_memory_target
helps. onode cache misses really hurt us and can increase latency and
hurt IOPS. Now that Adam's column family sharding PR has merged in
master we have two complimentary PRs the both help reduce OSD memory
consumption for caching onodes. For now you might see higher
performance if you can afford to give the OSDs more memory.
2) Check to see if the CPUs are being kept in a high power state. The
transition can cause higher latency and perversely the less CPU you use
the more likely the CPU is to drop into a low power state resulting in
higher latency and worse performance, especially if it ends up thrashing
between power states.
3) Lately I haven't seen the kv sync thread acting as a hard bottleneck
during 4KB random writes, but it still could be if you have a low
clocked processor (especially in a power saving state). This is still
an area to look carefully at if performance is low.
4) the bluefs_buffered_io change was the other thing I suspected but it
sounds like you've already tested that. never-the-less it would be good
to see if IOs are backing up. If you can get a wall clock profile with
gdbpmp you might be able to tell if io_submit is blocking. iostat or
collectl can also probably tell you if the device queue is backing up.
Hope this gives some ideas to start out!
Thanks,
Mark
On 6/4/20 10:07 AM, Stephan wrote:
Thanks for your fast reply! We just tried all four possible combinations of bluefs_preextend_wal_files and bluefs_buffered_io, but the write-iops in test "usecase1" remain the same. By the way bluefs_preextend_wal_files has been false in 14.2.9 (as in 15.2.3). Any other ideas?
David Orman wrote:
* bluestore: common/options.cc: disable bluefs_preextend_wal_files <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a proper
bugfix could be put into place. I suspect this might be the cause of the
symptoms you're seeing.
https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx