Thanks for starting this thread. This is something very interesting that would be very useful to have. I had talked about this with a few people, here's what we had in mind: - Instead of having a pass/fail against a baseline, track the performance over time. We can have the suite run periodically, and trigger runs after a significant code change and at specific milestones - Store performance values reported by the client; specifically, store the percentiles for a better understanding of how the performance changes - Store performance values reported by Ceph (from the perf dump?), these might be less volatile than the ones reported by the client - Use a database that allows easy querying those metrics. Jan had suggested InfluxDB which is a time series DB and would allow querying quite easy - Graph the performance through versions of Ceph (through commits) so we can find any regressions/improvements Of course, for this to be relevant we'd need a setup and HW that doesn't change. Does that fit with what's suggested here? Mohamad On 04/04/2018 02:59 PM, Neha Ojha wrote: > With the aim of doing automated performance testing using teuthology, > we integrated a cbt task[1] into it. This task enables teuthology to > run benchmarks like radosbench and librbdfio, by making use of the > workloads and settings defined in the performance suite[2]. This suite > runs as a part of the rados suite and the test results are stored in > the teuthology archives in JSON format. > > The final aim is to pass/fail tests based on performance results, but > we have faced a few challenges in this process. > > Determining reasonable baseline values for tests is difficult. > > - Teuthology applies different combination of configuration settings > each time it runs these workloads, which creates a large sample space > of configurations for us to track baselines for. > - Variability of hardware on which the performance tests are run in the lab. > - Repeatability of tests under the same conditions. > > Storing performance results. > > - Currently, the test results are stored in the teuthology archives. > We have figured out a way to store these results longer than usual(~2 > weeks), but in the long run this may not be an ideal location. > - +1 to Greg's idea of some kind of database system to feed these > results into and do better analysis. > > > We had a discussion at Cephalocon regarding the above, and based on > the ideas that came up, we have attempted to solve some of the issues. > > Last week, we merged a minimal performance suite[3], that runs 4 basic > jobs(subset of the perf suite) outside of the rados suite. > This suite is now run as a part of the nightly teuthology runs on a > specific set of machines(smithi) in the sepia lab, on the ceph master > branch. > Our aim here is to reduce the sample space of tests, and the > variability around these tests, so that we can come up with baselines > for this smaller subset. > We already have a simple result analysis tool, which can be integrated > with the cbt task to do the analysis and pass/fail tests based on > configurable thresholds. > > We are also planning to expand the cbt task to cover rgw workloads. > > Something that will be very useful in the short term would be a way to > easily view/compare the data collected in these nightly runs. > > [1] https://github.com/ceph/ceph/pull/17583 > [2] https://github.com/ceph/ceph/tree/master/qa/suites/rados/perf > [3] https://github.com/ceph/ceph/tree/master/qa/suites/perf-basic > > On Wed, Apr 4, 2018 at 12:55 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> Performance testing is an area that teuthology does not currently >> address. Neha is doing some work around integrating cbt (ceph >> benchmark tool, from Mark and other performance-interested people) >> into teuthology so we can run some performance jobs. But there’s a lot >> more work if we want to make long-term use of these to quantify our >> changes in performance, rather than micro-targeting specific patches >> in the lab. We’re concerned about random noise, machine variation, and >> reproducibility of results; and we have no way to identify trends. In >> short, we need some kind of database system to feed these results into >> and do analysis. This would be a whole new competency for the >> teuthology system and we’re not sure how best to go about it. But it’s >> becoming a major concern. >> PROBLEM TOPIC: how do we do performance testing and reliable analysis >> in the lab? >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html