POC- Distributed regression testing framework

Deepshikha Khandelwal <dkhandel@xxxxxxxxxx> · Mon, 25 Jun 2018 19:17:40 +0530

Hello folks,

>From last few months, I've been working on bringing distributed
regression testing to production. Our regression framework takes about
4+ hours to run on a single machine. To reduce the waiting time,
Facebook contributed a distributed test runner.

The solution supports the following:

1) Shares worker pool across different testers.
2) Try failure 3 times on 3 different machines before calling it a failure.
3) Supports running ASAN, Valgrind, ASAN without leaks.
4) Store the failed test logs on a centralized server[1].

distributed-regression[2] is the Jenkins job for this
distributed-regression testing.

There are currently a few known issues:
* Not collecting the entire logs (/var/log/glusterfs) from servers.
* A few tests fail due to infra-related issues like geo-rep tests.
* Takes ~80 minutes with 7 distributed servers (targetting 60 minutes)
* We've only tested plain regressions. ASAN and Valgrind are currently untested.

Before bringing it into production, we'll run this job nightly and
watch it for a month to debug the other failures.

Please let us know if you find any issues.

[1] https://ci-logs.gluster.org
[2] https://build.gluster.org/job/distributed-regression

Regards,
Deepshikha Khandelwal
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel