Re: Degradation of write-performance after upgrading to Octopus

Mark Nelson <mnelson@xxxxxxxxxx> · Thu, 4 Jun 2020 11:07:39 -0500

Oh, one other thing:

Check for background work, especially PG balancer.  In all of my tests 
the balancer was explicitly disabled.  During benchmarks there may be a 
high background workload affecting client IO if it's constantly 
rebalancing the number of PGs in the pool.

Mark

On 6/4/20 11:03 AM, Mark Nelson wrote:
Hi Stephan,

We recently ran a set of 3-sample tests looking at 2OSD/NVMe vs 1 
OSD/NVMe RBD performance on Nautilus, Octopus, and Master on some of 
our newer performance nodes with Intel P4510 NVMe drives. Those tests 
use the librbd fio backend.  We also saw similar randread and seq 
write performance increases but did not see a performance regression 
with 4KB random writes like you did.  In fact Octopus was 
significantly faster than Nautilus (but master regressed a little vs 
octopus).  We expect it to be significantly faster too as we improved 
the way the bluestore caches work and it's consistently shown gains 
for us.  Here are the most recent test results:

https://docs.google.com/spreadsheets/d/1e5eTeHdZnSizoY6AUjH0knb4jTCW7KMU4RoryLX9EHQ/edit?usp=sharing 

Having said that, this is the second report I've gotten regarding 
performance regression in Octopus so there could be something going on 
that we are missing.  If possible, could you run gdbpmp against one of 
your OSDs during the test?  That might help us figure out why it's 
slow.  Otherwise some other things to look at:

1) If this is a large dataset, see if increasing the osd_memory_target 
helps.  onode cache misses really hurt us and can increase latency and 
hurt IOPS.  Now that Adam's column family sharding PR has merged in 
master we have two complimentary PRs the both help reduce OSD memory 
consumption for caching onodes.  For now you might see higher 
performance if you can afford to give the OSDs more memory.

2) Check to see if the CPUs are being kept in a high power state. The 
transition can cause higher latency and perversely the less CPU you 
use the more likely the CPU is to drop into a low power state 
resulting in higher latency and worse performance, especially if it 
ends up thrashing between power states.

3) Lately I haven't seen the kv sync thread acting as a hard 
bottleneck during 4KB random writes, but it still could be if you have 
a low clocked processor (especially in a power saving state).  This is 
still an area to look carefully at if performance is low.

4) the bluefs_buffered_io change was the other thing I suspected but 
it sounds like you've already tested that.  never-the-less it would be 
good to see if IOs are backing up.  If you can get a wall clock 
profile with gdbpmp you might be able to tell if io_submit is 
blocking.  iostat or collectl can also probably tell you if the device 
queue is backing up.

Hope this gives some ideas to start out!

Thanks,

Mark

On 6/4/20 10:07 AM, Stephan wrote:
Thanks for your fast reply! We just tried all four possible 
combinations of bluefs_preextend_wal_files and bluefs_buffered_io, 
but the write-iops in test "usecase1" remain the same. By the way  
bluefs_preextend_wal_files has been false in 14.2.9 (as in 15.2.3). 
Any other ideas?

David Orman wrote:
* bluestore: common/options.cc: disable bluefs_preextend_wal_files  <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a 
proper
bugfix could be put into place. I suspect this might be the cause of 
the
symptoms you're seeing.

https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx