Hi Folks,
Recently there was a thread called "Tuning Nautilus for flash only" that
included a reference to a bluestore performance blog post from earlier
this year on the Ceph community website. There was some concern in that
thead regarding some of the tuning parameters presented in the article.
We discussed it in the core standup earlier this week and felt like we
should address it. I've included a reply that Paul made in that thread
as I think it's particularly relevant. Before I get into that though, I
absolutely want to encourage folks to run performance tests and report
their findings. To that end I want to thank Karan and Daniel for their
hard work and being willing to present their results. This kind of work
is difficult and presenting the results publicly can be a little rough!
Thank you Karan and Daniel and please continue running tests and
reporting your findings!
I also want to thank Paul for making several extremely important and
valid points below. I completely agree that some of the tuning
parameters presented in the article shouldn't be used in production.
Beyond disabling checksuming and authentication, I would highly
encourage folks to think about the ramifications of setting very low
numbers of pg log entries (especially when combined with low per-pool PG
counts via the autotuner). The effect on recovery could be
significant. Several other tunings in the article may have unintended
consequences. Imagine for instance what could happen with 32 concurrent
rocksdb compaction threads per OSD on a server that has a large number
of OSDs, oversubscribed DB devices, and underpowered CPUs. Personally I
would be concerned about the overhead under heavy load with large
databases full of OMAP data. There are cases where our defaults may not
be optimal, but many were set after a fair amount of performance testing
(and even more QE testing). We tend to be more conservative than not,
but often there is at least some level of thought and testing behind the
defaults.
In some cases, the optimal tuning may also be hardware or workload
specific. In the community test lab we have two different classes of
performance nodes. One is about 4 years old and uses older Xeon
processors and P3700 NVMe drives. Several years ago when bluestore was
young we saw that a 16K min alloc size was significantly faster than 4k
for small write workloads primarily due to encode/decode overhead. As
bluestore matured and improved, the gap between 16k and 4k min_alloc
sizes on that hardware largely evaporated. On our new nodes however, we
see a significant small write performance improvement when using a 4K
min alloc size (Likely due to CPU overhead during WAL writes now being a
bigger bottleneck than metadata IO in the DB). Of course, the min_alloc
size has a huge affect on the space-amplification of small objects as
well. This is just one example where an old set of tests on a single
hardware configuration may not tell the whole story (or even tell the
wrong story).
What I'm getting at here is that you shouldn't necessarily trust any
single set of tests (including mine!). This is especially true when
multiple configuration parameters are changed at the same time and it's
not clear how each parameter is affecting the results. I would
encourage folks to look at multiple sets of results, look especially at
tests that change a single parameter at a time, and also give higher
credence to results that provide evidence for why performance changed.
This might include profiling data, examples where specific code is shown
to be sub-optimal, or corroborating data from tests run by other users.
And Paul's advice below to run your own benchmarks that are relevant to
your use case is spot on as well.
Thanks,
Mark
On 11/28/19 10:46 AM, Paul Emmerich wrote:
Please don't run this config in production.
Disabling checksumming is a bad idea, disabling authentication is also
pretty bad.
There are also a few options in there that no longer exist (osd op
threads) or are no longer relevant (max open files), in general, you
should not blindly copy config files you find on the Internet. Only
set an option to its non-default value after carefully checking what
it does and whether it applies to your use case.
Also, run benchmarks yourself. Use benchmarks that are relevant to
your use case.
Paul
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx