Hello folks, >From last few months, I've been working on bringing distributed regression testing to production. Our regression framework takes about 4+ hours to run on a single machine. To reduce the waiting time, Facebook contributed a distributed test runner. The solution supports the following: 1) Shares worker pool across different testers. 2) Try failure 3 times on 3 different machines before calling it a failure. 3) Supports running ASAN, Valgrind, ASAN without leaks. 4) Store the failed test logs on a centralized server[1]. distributed-regression[2] is the Jenkins job for this distributed-regression testing. There are currently a few known issues: * Not collecting the entire logs (/var/log/glusterfs) from servers. * A few tests fail due to infra-related issues like geo-rep tests. * Takes ~80 minutes with 7 distributed servers (targetting 60 minutes) * We've only tested plain regressions. ASAN and Valgrind are currently untested. Before bringing it into production, we'll run this job nightly and watch it for a month to debug the other failures. Please let us know if you find any issues. [1] https://ci-logs.gluster.org [2] https://build.gluster.org/job/distributed-regression Regards, Deepshikha Khandelwal _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel