Re: Cephalocon QA: Performance testing in teuthology

Neha Ojha <nojha@xxxxxxxxxx> · Wed, 4 Apr 2018 11:59:34 -0700

With the aim of doing automated performance testing using teuthology,
we integrated a cbt task[1] into it. This task enables teuthology to
run benchmarks like radosbench and librbdfio, by making use of the
workloads and settings defined in the performance suite[2]. This suite
runs as a part of the rados suite and the test results are stored in
the teuthology archives in JSON format.

The final aim is to pass/fail tests based on performance results, but
we have faced a few challenges in this process.

Determining reasonable baseline values for tests is difficult.

- Teuthology applies different combination of configuration settings
each time it runs these workloads, which creates a large sample space
of configurations for us to track baselines for.
- Variability of hardware on which the performance tests are run in the lab.
- Repeatability of tests under the same conditions.

Storing performance results.

- Currently, the test results are stored in the teuthology archives.
We have figured out a way to store these results longer than usual(~2
weeks), but in the long run this may not be an ideal location.
- +1 to Greg's idea of some kind of database system to feed these
results into and do better analysis.

We had a discussion at Cephalocon regarding the above, and based on
the ideas that came up, we have attempted to solve some of the issues.

Last week, we merged a minimal performance suite[3], that runs 4 basic
jobs(subset of the perf suite) outside of the rados suite.
This suite is now run as a part of the nightly teuthology runs on a
specific set of machines(smithi) in the sepia lab, on the ceph master
branch.
Our aim here is to reduce the sample space of tests, and the
variability around these tests, so that we can come up with baselines
for this smaller subset.
We already have a simple result analysis tool, which can be integrated
with the cbt task to do the analysis and pass/fail tests based on
configurable thresholds.

We are also planning to expand the cbt task to cover rgw workloads.

Something that will be very useful in the short term would be a way to
easily view/compare the data collected in these nightly runs.

[1] https://github.com/ceph/ceph/pull/17583
[2] https://github.com/ceph/ceph/tree/master/qa/suites/rados/perf
[3] https://github.com/ceph/ceph/tree/master/qa/suites/perf-basic

On Wed, Apr 4, 2018 at 12:55 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> Performance testing is an area that teuthology does not currently
> address. Neha is doing some work around integrating cbt (ceph
> benchmark tool, from Mark and other performance-interested people)
> into teuthology so we can run some performance jobs. But there’s a lot
> more work if we want to make long-term use of these to quantify our
> changes in performance, rather than micro-targeting specific patches
> in the lab. We’re concerned about random noise, machine variation, and
> reproducibility of results; and we have no way to identify trends. In
> short, we need some kind of database system to feed these results into
> and do analysis. This would be a whole new competency for the
> teuthology system and we’re not sure how best to go about it. But it’s
> becoming a major concern.
> PROBLEM TOPIC: how do we do performance testing and reliable analysis
> in the lab?
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html