On Thu, Jun 18, 2015 at 1:17 AM, Niels de Vos <ndevos@xxxxxxxxxx> wrote: > On Wed, Jun 17, 2015 at 04:26:13PM +0530, Raghavendra Talur wrote: >> Hi, >> >> >> MSV Bhat and I had presented in Gluster Design Summit some ideas about >> improving our testing infrastructure. >> >> Here is the link to the slides: http://redhat.slides.com/rtalur/distaf# >> >> Here are the same suggestions, >> >> 1. *A .t file for a bug* >> When a community user discovers a bug in Gluster, they contact us over irc >> or email and eventually end up filling a bug in bugzilla. >> Many times it so happens that we find a bug which we don't know the >> fix for OR not a bug in our module and also end up filling a bug in >> bugzilla. >> >> If we could rather write a .t test to reproduce the bug and add it to >> say /tests/bug/yet-to-be-fixed/ folder in gluster repo it would be >> more helpful. As part of bug-triage we could try doing the same for bugs >> filed by community users. >> >> *What do we get?* >> >> a. very easy for a new developer to pick up that bug and fix it. >> If .t passes then the bug is fixed. >> >> b. The regression on daily patch sets would skip this folder; but on a >> nightly basis we could run a test on this folder to see if any of these >> tests got fixed while we were fixing some other tests. Yay! > > This is surely a nice addition. When do you think something like this > could be made available? > > >> 2. *New gerrit/review work flow* >> >> Our gerrit setup currently has a 2 hour average for regression run. >> Due to long queue of commits the round about time is around 4-6 hours. >> >> Kaushal has proposed on how to reduce round about time more in this thread >> http://www.spinics.net/lists/gluster-devel/msg15798.html. > > I'll try to respond to that email later :) > > >> 3. *Make sure tests can be done in docker and run in parallel* >> >> To reduce time for one test run from 2 hours we can look at running >> tests in parallel. I did a prototype and got test time down to 40 mins >> on a 16 GB RAM and 4 core VM. >> >> Current blocked at : >> Some of the tests fail in docker while they pass in a VM. >> Note that it is .t failing, Gluster works fine in docker. >> Need some help on this. More on this in a mail I will be sending later today >> at gluster-devel. > > So, this parallelisation does not help us with the speed up on NetBSD > (no docker there). Because it does not help to get to a quicker > end-result, I do not see a high priority for introducing docker. NetBSD regressions already skip a lot of tests, mostly those involving snapshot, and are quicker than the Linux regressions to finish. > > The VMs we use, have 2GB of RAM. RAM is expensive in the cloud, so we > would need to upgrade the VMs we have to be able to run multiple docker > containers. A VM with 1GB of RAM results in many spurious failures, I > dont know how much RAM we should give a VM for docker runs. > > I also do not think all developers run the regression tests on their > systems, there are regular compile errors caught in the smoke and > regression tests... There is also a tendency for rebasing changes often, > even for cases where there is no need. These rebases add to the job > queue in Jenkins for little advantage. Updating a commit message to > trigger a regression, results in 2 smoke jobs, 2 regression jobs and a > number of other (rpmbbuild, bug-check, ...) jobs. Educating developers > to test before posting and only retrigger the needed jobs would help a > lot too. > > My strong preference would be to split the gigantic regression test into > smaller pieces. We have already started that by placing the .t files in > their own component directories. It should be easy to setup Jenkins jobs > for each directory (or groups of dirs) and run multiple tests in > parallel. > > Going different routes (docker vs VM) for different operating systems > does not sound like a good plan to me. I prefer to have things as much > as equal as possible. Additional docker tests would be cool, but I'm in > doubt about replacing the VM tests with it. > > Once we have achieved parallelism for the VM tests, we could look into > having more VMs. VMs in the cloud cost money when they are running, our > Jenkins slaves are online 24x7. There is a Jenkins plugin that makes it > possible to poweron/poweroff a VM when (not) needed. This could > potentially save us a lot of money, and make it possible to use those > savings for additional VMs (that are only running when needed). > > >> *what do we get?* >> Running 4 docker containers on our Laptops itself can reduce time >> taken by test runs down to 90 mins. Running them on powerful machines, >> it is down to 40 mins as seen in the prototype. > > If developers would run docker tests, sure, it would be a nice > improvement over the very few developers that run regressions tests for > their changes. > > >> 4. *Test definitions for every .t* >> >> May be the time has come to upgrade our test infra to have tests with >> test definitions. Every .t file could have a corresponding .def file >> which is >> A JSON/YAML/XML config >> Defines the requirements of test >> Type of volume >> Special knowledge of brick size required? >> Which repo source folders should trigger this test >> Running time >> Test RUN level >> >> *what do we get?* >> a. Run a partial set of tests on a commit based on git log and test >> definitions and run complete regression as nightly. >> b. Order test run based on run times. This combined with fail on first test >> setting we have, we will fail as early as possible. >> c. Order tests based on functionality level, which means a mount.t basic >> test should run before a complex DHT test that makes use of FUSE mount. >> Again, this will help us to fail as early as possible in failure scenarios. >> d. With knowledge of type of volume required and number of bricks required, >> we can re-use volumes that are created for subsequent tests. >> Even the cleanup() function we have takes time. DiSTAF already has a >> function equivalent to use_existing_else_create_new. > > I'm not sure how well this would work with the parallel testing. But > yes, it seems like a good suggestion. Even if it forces developers to > think about creating the needed volumes for their tests. There should be > little need for a complex volume if it is only used for simple mount > testing or such. > > >> 5. *Testing GFAPI* >> We don't have a good test framework for gfapi as of today. >> >> However, with the recent design proposal at https://docs.google.com/document/d/1yuRLRbdccx_0V0UDAxqWbz4g983q5inuINHgM1YO040/edit?usp=sharing > > Yes, this seems like a helpful testing tool. There is still the need for > writing small .c files that test certain functions in libgfapi. > Unfortunatelt it is not trivial to include the compilation of these > tests while running the regression cases. I think we should provide an > easy to use (build)framework and example to get those done correctly. > Building a test .c file against the libgfapi version under test, with > all the correct (pkg-config) flags and paths isnt straight forward. > >> >> and >> >> Craig Cabrey from Facebook developing a set of coreutils using >> GFAPI as mentioned here >> http://www.spinics.net/lists/gluster-devel/msg15753.html > > These wont be targetting the testing of libgfapi, rather should provide > easy access to Gluster volumes for users and maybe some applications. I > think we should see Craigs tools just like Qemu, Samba and NFS-Ganesha > use-cases that should get included in automated testing in future too. > > Thanks, > Niels > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel