Re: Difference in bad_tests count in mainline vs 3.7 branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






Maintainers - can you please take stock of this and ensure sanity of your components before merging patches that do not fix a failing test?


Here is my proposal to get this fixed.


This weekend, 5th September 0400 UTC, I will start a jenkins run on master and 3.7 branches.
  • It will be re-based with code just before it is run, so all patches merged by 4th September would be tested.
  • It will run each test for 10 times in succession. Why 10?
    • Hope to find tests that fail occasionally.
    • If the tests fails only for 1st run, it could very well be a cleanup issue with last run test.
    • Failures within the 10 runs in a pattern is again indicative of some cleanup/timeout error.
  • It will run all tests and not stop at the first failure.
  • I will have scripts modified to get maximum data from logs. (It will still be INFO level logs)
After the test completes, I will file a bug against the component of the .t tests that fail in this run and immediately add the test to bad tests list.

What should the maintainers do after that?
  • If a bug is filed against your component, please spend some time on Monday and root cause the issue by Monday EOD.
  • If the root cause proves that the bug is in .t file
    • It is would be mostly because
      • The timeouts are not enough all the time. Change EXPECT_WITHIN values and check.
      • The test is not deterministic enough ; some of the assumptions that test makes might not always be true. For example, a SIGTERM followed by a TEST which assumes that process is definitely killed is a wrong assumption. Use SIGKILL in such cases. (I know SIGKILL may not work too if the process is in D state, but its a good enough example)
    • It is easier to fix bugs in.t once the root cause is found. Please fix the issue and remove it from bad tests list. Use the bug filed against this .t file.
  • If the root cause proves that the bug is in Gluster code:
    • If the bug is in same component as the .t file:
      • In this case, you are the component owner, change the description and summary of the bug filed to indicate the actual issue.
      • If the time required to fix the issue in Gluster code is non-minimal
        • Put a workaround in .t file with a comment clearly stating the bug number which would later fix it and remove the test from bad test list.
        • If a workaround is not possible let the test remain in bad test list.
    • If the bug is not in same component as the .t file:
      • Update the bug with details which prove that bug is not in the same component and change the component accordingly.
      • It is new owner's responsibility to provide a workaround for all .t files hit by the issue and fix the code.
Note to all maintainers:
  • I would request everyone to resist merging patches this weekend unless critically required. It would help us in debugging on Monday.

Lets hope that when we do a similar jenkins run on next weekend, September 12th, we don't find any failures.

Suggestions welcome for any changes in the above plan.

Thanks,
Raghavendra Talur
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux