Re: Degradation of write-performance after upgrading to Octopus

Mark Nelson <mnelson@xxxxxxxxxx> · Thu, 4 Jun 2020 11:03:13 -0500

Hi Stephan,

We recently ran a set of 3-sample tests looking at 2OSD/NVMe vs 1 
OSD/NVMe RBD performance on Nautilus, Octopus, and Master on some of our 
newer performance nodes with Intel P4510 NVMe drives. Those tests use 
the librbd fio backend.  We also saw similar randread and seq write 
performance increases but did not see a performance regression with 4KB 
random writes like you did.  In fact Octopus was significantly faster 
than Nautilus (but master regressed a little vs octopus).  We expect it 
to be significantly faster too as we improved the way the bluestore 
caches work and it's consistently shown gains for us.  Here are the most 
recent test results:

https://docs.google.com/spreadsheets/d/1e5eTeHdZnSizoY6AUjH0knb4jTCW7KMU4RoryLX9EHQ/edit?usp=sharing

Having said that, this is the second report I've gotten regarding 
performance regression in Octopus so there could be something going on 
that we are missing.  If possible, could you run gdbpmp against one of 
your OSDs during the test?  That might help us figure out why it's 
slow.  Otherwise some other things to look at:

1) If this is a large dataset, see if increasing the osd_memory_target 
helps.  onode cache misses really hurt us and can increase latency and 
hurt IOPS.  Now that Adam's column family sharding PR has merged in 
master we have two complimentary PRs the both help reduce OSD memory 
consumption for caching onodes.  For now you might see higher 
performance if you can afford to give the OSDs more memory.

2) Check to see if the CPUs are being kept in a high power state.  The 
transition can cause higher latency and perversely the less CPU you use 
the more likely the CPU is to drop into a low power state resulting in 
higher latency and worse performance, especially if it ends up thrashing 
between power states.

3) Lately I haven't seen the kv sync thread acting as a hard bottleneck 
during 4KB random writes, but it still could be if you have a low 
clocked processor (especially in a power saving state).  This is still 
an area to look carefully at if performance is low.

4) the bluefs_buffered_io change was the other thing I suspected but it 
sounds like you've already tested that.  never-the-less it would be good 
to see if IOs are backing up.  If you can get a wall clock profile with 
gdbpmp you might be able to tell if io_submit is blocking.  iostat or 
collectl can also probably tell you if the device queue is backing up.

Hope this gives some ideas to start out!

Thanks,

Mark

On 6/4/20 10:07 AM, Stephan wrote:
Thanks for your fast reply! We just tried all four possible combinations of bluefs_preextend_wal_files and bluefs_buffered_io, but the write-iops in test "usecase1" remain the same. By the way  bluefs_preextend_wal_files has been false in 14.2.9 (as in 15.2.3). Any other ideas?

David Orman wrote:
* bluestore: common/options.cc: disable bluefs_preextend_wal_files  <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a proper
bugfix could be put into place. I suspect this might be the cause of the
symptoms you're seeing.

https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx