On 06/15/2015 04:19 PM, Kaushal M wrote: > Hi all, > > The recent rush of reviews being sent due to the release of 3.7 was > a cause of frustration for many of us because of the regression > tests (gerrit troubles themselves are another thing). > > W.R.T regression 3 main sources of frustration were, 1. Spurious > test failures 2. Long wait times 3. Regression slave troubles > > We've already tackled the spurious failure issue and are quite > stable now. The trouble with the slave vms is related to the gerrit > issues, and is mainly due to the network issues we are having > between the data-centers hosting the slaves and gerrit/jenkins. > People have been looking into this, but we haven't had much > success. This leaves the issue of the long wait times. > > The long wait times are because of the long queues of pending > jobs, some of which take days to get scheduled. Two things cause > the long queues, 1. Automatic regression job triggering for all > submissions to gerrit 2. Long run time for regression (~2h) > > The long queues coupled with the spurious failure and network > problems, meant that jobs would fail for no reason after a long > wait, and would have to be added to the back of the queue to be > re-run. This meant that developers would have to wait days for > their changes to get merged, and was one of the causes for the > delay in the release of 3.7. > > The solution reduce wait times for regression runs. To reduce wait > times we should, 1. Trigger runs only when required 2. Reduce > regression run time. > > Raghavendra Talur (rtalur/RaSTar) will soon send out a mail with > his findings on the regression run times, and we can continue > discussion on it on that thread. > > Earlier, the regression runs used to be manually triggered by the > maintainers once they had determined that a change was ready for > submission. But as there were only two maintainers before (Vijay > and Avati) auto triggering was brought in to reduce their load. > Auto triggering worked fine when we had a lower volume of changes > being submitted to gerrit. But now, with the large volumes we see > during the release freeze dates, auto triggering just adds to > problems. > > I propose that we move back to the old model of starting > regression runs only once the maintainers are ready to merge. But > instead of the maintainers manually tiggering the runs, we could > automate it. > > We can model our new workflow on those of OpenStack[1] and > Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't > provide the features necessary to enable selective triggering based > on Gerrit flags. Both OpenStack and Wikimedia use a project gating > tool called Zuul[3], which provides a much better integration with > Jenkins and Gerrit and more features on top. > > I propose the following work flow, > > - Developer pushes change to Gerrit. - Zuul is notified by Gerrit > of new change - Zuul runs pre-review checks on Jenkins. This will > be the current smoke tests. - Zuul reports back status of the > checks to Gerrit. - If checks fail, developer will need to resend > the change after the required fixes. The process starts once more. > - If the checks pass, the change is now ready for review - The > change is now reviewed by other developers and maintainers. > Non-maintainers will be able to give only a +1 review. - On a > negative review, the developer will need to rework the change and > resend it. The process starts once more. - The maintainer give a +2 > review once he/she is satisfied. The maintainers work is done > here. - Zuul is notified of the +2 review - Zuul runs the > regression runs and reports back the status. - If the regression > runs fail, the process starts over again. - If the runs pass, the > change is ready for acceptance. - Zuul will pick the change into > the repository. - If the pick fails, Zuul will report back the > failure, and the process starts once again. > +1, Good approach. > Following this flow should, 1. Reduce regression wait time 2. > Improve change acceptance time 3. Reduce unnecessary wastage of > infra resources 4. Improve infra stability. > > It also brings in drawbacks that we need to maintain one other > piece of infra (Zuul). This would be an additional maintenance > overhead on top of Gerrit, Jenkins and the current slaves. But I > feel the reduction in the upkeep efforts of the slaves would be > enough to offset this. > > tl;dr Current auto-triggering of regression runs is stupid and a > waste of time and resources. Bring in a project gating system, > Zuul, which can do a much more intelligent jobs triggering, and use > it to automatically trigger regression only for changes with > Reviewed+2 and automatically merge ones that pass. > > What does the community think of this? > > ~kaushal > > [1]: > http://docs.openstack.org/infra/manual/developers.html#automated-testing > > [2]: https://www.mediawiki.org/wiki/Continuous_integration/Workflow > [3]: http://docs.openstack.org/infra/zuul/ > _______________________________________________ Gluster-devel > mailing list Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel