On Tue, Mar 13, 2018 at 4:38 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Tue, Mar 13, 2018 at 7:38 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >> On Tue, Mar 13, 2018 at 10:03 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >>> On Tue, 13 Mar 2018, Alfredo Deza wrote: >>>> The current "make check" job on pull requests is configured to require >>>> a "passing/OK" state to allow a merge. >>>> >>>> Looking back at the past 100 builds since March 13th, there is roughly a 20% >>>> failure rate [0]. This is a similar failure rate for ceph-volume PRs which never >>>> hit any make check paths: 6 failures out of the last 25 ceph-volume >>>> pull requests have >>>> make check failures). >>>> >>>> These failures in make check means that we must almost always ignore them, and >>>> use administrator privilege to merge. This is far from ideal, and further >>>> reduces the confidence in the tests. >>>> >>>> Some of the failures are produced by code that implies a grey area, enough to >>>> do a non-zero exit status: >>>> >>>> /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/cli/osdmaptool/test-map-pgs.t: >>>> failed >>>> --- /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/cli/osdmaptool/test-map-pgs.t >>>> +++ /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/cli/osdmaptool/test-map-pgs.t.err >>>> @@ -40,6 +40,7 @@ >>>> # it is almost impossible to get the same stats with random and crush >>>> # if they are, it most probably means something went wrong somewhere >>>> $ test "$STATS_CRUSH" != "$STATS_RANDOM" >>>> + [1] >>>> # Ran 13 tests, 0 skipped, 1 failed. >>> >>> If this is a nondeterministic test case then we should remove it! >>> >>> The harder case are the ones that are nondeterministic because of >>> environmental conditions. I think we don't understand the why well enough >>> to fix (or skip). >> >> This is kind of what I was looking for as well: the possibility of >> start pruning tests that aren't working well for us. Since there seems >> to be >> a strong interest in just keeping make check around as-is. >> >> I don't know enough of these tests, otherwise I would offer to start >> helping here. In the case of ceph-disk, I think in *master* they could >> be removed from make check entirely >> and rely on ad-hoc ceph-disk testing when targetted PRs show up. That >> would reduce a chunk of time that is spent on setting up the ceph-disk >> test environment. > > I don't think anybody in the project any more knows about those tests > much Then what is the value if no one knows about them? What is the purpose of a test if it fails and it doesn't tells us why? >. I'd recommend just creating bugs for non-deterministic tests > when you run across them and we can start working our way through them > as a group to make our tests more useful overall. That is kind of my issue here: I don't know. Some of them look non-deterministic enough to raise the issue > (Just anecdotally, I see failures from machine disconnects or whatever > a lot more often than issues in "make check". But I don't do enough > with it to run statistics.) Sure, like I said, environmental issues are fine, we should retrigger at will and try to re-run them again > > As for turning them off for ceph-disk...oh, I think I misunderstood > what you were proposing. Are you saying those are some of the noisy > tests? And as we move to ceph-volume there's little point testing > ceph-disk in master? ceph-disk tests are tied into make check, and in master I don't see a need, as those can be run ad-hoc as needed, not every time on every pull request. I guess the ceph-disk thing is more of a corollary to the more generic comment on why I think the check is not robust enough and keeps failing even when code is not affecting it. > -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html