On Tue, Mar 13, 2018 at 7:38 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: > On Tue, Mar 13, 2018 at 10:03 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> On Tue, 13 Mar 2018, Alfredo Deza wrote: >>> The current "make check" job on pull requests is configured to require >>> a "passing/OK" state to allow a merge. >>> >>> Looking back at the past 100 builds since March 13th, there is roughly a 20% >>> failure rate [0]. This is a similar failure rate for ceph-volume PRs which never >>> hit any make check paths: 6 failures out of the last 25 ceph-volume >>> pull requests have >>> make check failures). >>> >>> These failures in make check means that we must almost always ignore them, and >>> use administrator privilege to merge. This is far from ideal, and further >>> reduces the confidence in the tests. >>> >>> Some of the failures are produced by code that implies a grey area, enough to >>> do a non-zero exit status: >>> >>> /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/cli/osdmaptool/test-map-pgs.t: >>> failed >>> --- /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/cli/osdmaptool/test-map-pgs.t >>> +++ /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/cli/osdmaptool/test-map-pgs.t.err >>> @@ -40,6 +40,7 @@ >>> # it is almost impossible to get the same stats with random and crush >>> # if they are, it most probably means something went wrong somewhere >>> $ test "$STATS_CRUSH" != "$STATS_RANDOM" >>> + [1] >>> # Ran 13 tests, 0 skipped, 1 failed. >> >> If this is a nondeterministic test case then we should remove it! >> >> The harder case are the ones that are nondeterministic because of >> environmental conditions. I think we don't understand the why well enough >> to fix (or skip). > > This is kind of what I was looking for as well: the possibility of > start pruning tests that aren't working well for us. Since there seems > to be > a strong interest in just keeping make check around as-is. > > I don't know enough of these tests, otherwise I would offer to start > helping here. In the case of ceph-disk, I think in *master* they could > be removed from make check entirely > and rely on ad-hoc ceph-disk testing when targetted PRs show up. That > would reduce a chunk of time that is spent on setting up the ceph-disk > test environment. I don't think anybody in the project any more knows about those tests much. I'd recommend just creating bugs for non-deterministic tests when you run across them and we can start working our way through them as a group to make our tests more useful overall. (Just anecdotally, I see failures from machine disconnects or whatever a lot more often than issues in "make check". But I don't do enough with it to run statistics.) As for turning them off for ceph-disk...oh, I think I misunderstood what you were proposing. Are you saying those are some of the noisy tests? And as we move to ceph-volume there's little point testing ceph-disk in master? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html