Re: POC- Distributed regression testing framework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Mon, Jun 25, 2018 at 7:17 PM, Deepshikha Khandelwal <dkhandel@xxxxxxxxxx> wrote:
Hello folks,

>From last few months, I've been working on bringing distributed
regression testing to production. Our regression framework takes about
4+ hours to run on a single machine. To reduce the waiting time,
Facebook contributed a distributed test runner.

The solution supports the following:

1) Shares worker pool across different testers.
2) Try failure 3 times on 3 different machines before calling it a failure.
3) Supports running ASAN, Valgrind, ASAN without leaks.
4) Store the failed test logs on a centralized server[1].

distributed-regression[2] is the Jenkins job for this
distributed-regression testing.

There are currently a few known issues:
* Not collecting the entire logs (/var/log/glusterfs) from servers.

If I look at the activities involved with regression failures, this can wait.
 
* A few tests fail due to infra-related issues like geo-rep tests.

Please open bugs for this, so we can track them, and take it to closure.
 
* Takes ~80 minutes with 7 distributed servers (targetting 60 minutes)

Time can change with more tests added, and also please plan to have number of server as 1 to n.
 
* We've only tested plain regressions. ASAN and Valgrind are currently untested.

Great to have it running not 'per patch', but as nightly, or weekly to start with. 

Before bringing it into production, we'll run this job nightly and
watch it for a month to debug the other failures.


I would say, bring it to production sooner, say 2 weeks, and also plan to have the current regression as is with a special command like 'run regression in-one-machine' in gerrit (or something similar) with voting rights, so we can fall back to this method if something is broken in parallel testing.

I have seen that regardless of amount of time we put some scripts in testing, the day we move to production, some thing would be broken. So, let that happen earlier than later, so it would help next release branching out. Don't want to be stuck for branching due to infra failures.

Regards,
Amar

 
Please let us know if you find any issues.

[1] https://ci-logs.gluster.org
[2] https://build.gluster.org/job/distributed-regression

Regards,
Deepshikha Khandelwal
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel





--
Amar Tumballi (amarts)
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux