Oh, one other thing:
Check for background work, especially PG balancer. In all of my tests
the balancer was explicitly disabled. During benchmarks there may be a
high background workload affecting client IO if it's constantly
rebalancing the number of PGs in the pool.
Mark
On 6/4/20 11:03 AM, Mark Nelson wrote:
Hi Stephan,
We recently ran a set of 3-sample tests looking at 2OSD/NVMe vs 1
OSD/NVMe RBD performance on Nautilus, Octopus, and Master on some of
our newer performance nodes with Intel P4510 NVMe drives. Those tests
use the librbd fio backend. We also saw similar randread and seq
write performance increases but did not see a performance regression
with 4KB random writes like you did. In fact Octopus was
significantly faster than Nautilus (but master regressed a little vs
octopus). We expect it to be significantly faster too as we improved
the way the bluestore caches work and it's consistently shown gains
for us. Here are the most recent test results:
https://docs.google.com/spreadsheets/d/1e5eTeHdZnSizoY6AUjH0knb4jTCW7KMU4RoryLX9EHQ/edit?usp=sharing
Having said that, this is the second report I've gotten regarding
performance regression in Octopus so there could be something going on
that we are missing. If possible, could you run gdbpmp against one of
your OSDs during the test? That might help us figure out why it's
slow. Otherwise some other things to look at:
1) If this is a large dataset, see if increasing the osd_memory_target
helps. onode cache misses really hurt us and can increase latency and
hurt IOPS. Now that Adam's column family sharding PR has merged in
master we have two complimentary PRs the both help reduce OSD memory
consumption for caching onodes. For now you might see higher
performance if you can afford to give the OSDs more memory.
2) Check to see if the CPUs are being kept in a high power state. The
transition can cause higher latency and perversely the less CPU you
use the more likely the CPU is to drop into a low power state
resulting in higher latency and worse performance, especially if it
ends up thrashing between power states.
3) Lately I haven't seen the kv sync thread acting as a hard
bottleneck during 4KB random writes, but it still could be if you have
a low clocked processor (especially in a power saving state). This is
still an area to look carefully at if performance is low.
4) the bluefs_buffered_io change was the other thing I suspected but
it sounds like you've already tested that. never-the-less it would be
good to see if IOs are backing up. If you can get a wall clock
profile with gdbpmp you might be able to tell if io_submit is
blocking. iostat or collectl can also probably tell you if the device
queue is backing up.
Hope this gives some ideas to start out!
Thanks,
Mark
On 6/4/20 10:07 AM, Stephan wrote:
Thanks for your fast reply! We just tried all four possible
combinations of bluefs_preextend_wal_files and bluefs_buffered_io,
but the write-iops in test "usecase1" remain the same. By the way
bluefs_preextend_wal_files has been false in 14.2.9 (as in 15.2.3).
Any other ideas?
David Orman wrote:
* bluestore: common/options.cc: disable bluefs_preextend_wal_files <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a
proper
bugfix could be put into place. I suspect this might be the cause of
the
symptoms you're seeing.
https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx