With the aim of doing automated performance testing using teuthology, we integrated a cbt task[1] into it. This task enables teuthology to run benchmarks like radosbench and librbdfio, by making use of the workloads and settings defined in the performance suite[2]. This suite runs as a part of the rados suite and the test results are stored in the teuthology archives in JSON format. The final aim is to pass/fail tests based on performance results, but we have faced a few challenges in this process. Determining reasonable baseline values for tests is difficult. - Teuthology applies different combination of configuration settings each time it runs these workloads, which creates a large sample space of configurations for us to track baselines for. - Variability of hardware on which the performance tests are run in the lab. - Repeatability of tests under the same conditions. Storing performance results. - Currently, the test results are stored in the teuthology archives. We have figured out a way to store these results longer than usual(~2 weeks), but in the long run this may not be an ideal location. - +1 to Greg's idea of some kind of database system to feed these results into and do better analysis. We had a discussion at Cephalocon regarding the above, and based on the ideas that came up, we have attempted to solve some of the issues. Last week, we merged a minimal performance suite[3], that runs 4 basic jobs(subset of the perf suite) outside of the rados suite. This suite is now run as a part of the nightly teuthology runs on a specific set of machines(smithi) in the sepia lab, on the ceph master branch. Our aim here is to reduce the sample space of tests, and the variability around these tests, so that we can come up with baselines for this smaller subset. We already have a simple result analysis tool, which can be integrated with the cbt task to do the analysis and pass/fail tests based on configurable thresholds. We are also planning to expand the cbt task to cover rgw workloads. Something that will be very useful in the short term would be a way to easily view/compare the data collected in these nightly runs. [1] https://github.com/ceph/ceph/pull/17583 [2] https://github.com/ceph/ceph/tree/master/qa/suites/rados/perf [3] https://github.com/ceph/ceph/tree/master/qa/suites/perf-basic On Wed, Apr 4, 2018 at 12:55 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > Performance testing is an area that teuthology does not currently > address. Neha is doing some work around integrating cbt (ceph > benchmark tool, from Mark and other performance-interested people) > into teuthology so we can run some performance jobs. But there’s a lot > more work if we want to make long-term use of these to quantify our > changes in performance, rather than micro-targeting specific patches > in the lab. We’re concerned about random noise, machine variation, and > reproducibility of results; and we have no way to identify trends. In > short, we need some kind of database system to feed these results into > and do analysis. This would be a whole new competency for the > teuthology system and we’re not sure how best to go about it. But it’s > becoming a major concern. > PROBLEM TOPIC: how do we do performance testing and reliable analysis > in the lab? > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html