Hi Victoria, [sorry for the long delay, I was being pulled into too many directions, but am incredibly grateful for your fantastic investigation.] On Thu, 24 Mar 2022, Victoria Dye wrote: > Johannes Schindelin via GitGitGadget wrote: > > Changes since v1: > > > > * In the patch that removed MAKE_TARGETS, a stale comment about that > > variable is also removed. > > * The comment about set -x has been adjusted because it no longer applies > > as-is. > > * The commit message of "ci: make it easier to find failed tests' logs in > > the GitHub workflow" has been adjusted to motivate the improvement > > better. > > > > > > Background > > ========== > > > > Recent patches intended to help readers figure out CI failures much quicker > > than before. Unfortunately, they haven't been entirely positive for me. For > > example, they broke the branch protections in Microsoft's fork of Git, where > > we require Pull Requests to pass a certain set of Checks (which are > > identified by their names) and therefore caused follow-up work. > > > > Using CI and in general making it easier for new contributors is an area I'm > > passionate about, and one I'd like to see improved. > > > > > > The current situation > > ===================== > > > > Let me walk you through the current experience when a PR build fails: I get > > a notification mail that only says that a certain job failed. There's no > > indication of which test failed (or was it the build?). I can click on a > > link at it takes me to the workflow run. Once there, all it says is "Process > > completed with exit code 1", or even "code 2". Sure, I can click on one of > > the failed jobs. It even expands the failed step's log (collapsing the other > > steps). And what do I see there? > > > > Let's look at an example of a failed linux-clang (ubuntu-latest) job > > [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]: > > > > [...] > > Test Summary Report > > ------------------- > > t1092-sparse-checkout-compatibility.sh (Wstat: 256 Tests: 53 Failed: 1) > > Failed test: 49 > > Non-zero exit status: 1 > > t3701-add-interactive.sh (Wstat: 0 Tests: 71 Failed: 0) > > TODO passed: 45, 47 > > Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr 1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU) > > Result: FAIL > > make[1]: *** [Makefile:53: prove] Error 1 > > make[1]: Leaving directory '/home/runner/work/git/git/t' > > make: *** [Makefile:3018: test] Error 2 > > > > > > That's it. I count myself lucky not to be a new contributor being faced with > > something like this. > > > > Now, since I am active in the Git project for a couple of days or so, I can > > make sense of the "TODO passed" label and know that for the purpose of > > fixing the build failures, I need to ignore this, and that I need to focus > > on the "Failed test" part instead. > > > > I also know that I do not have to get myself an ubuntu-latest box just to > > reproduce the error, I do not even have to check out the code and run it > > just to learn what that "49" means. > > > > I know, and I do not expect any new contributor, not even most seasoned > > contributors to know, that I have to patiently collapse the "Run > > ci/run-build-and-tests.sh" job's log, and instead expand the "Run > > ci/print-test-failures.sh" job log (which did not fail and hence does not > > draw any attention to it). > > > > You can count me as one of the ones that, until this series, had no idea > this information existed in the CI logs. And it definitely would have been > helpful in figuring out some of the OS-specific bugs I've run into in the > past. :) Excellent. I think I have enough accounts to consider it a universal truth that our CI output _needs_ something like this here patch series. > > I know, and again: I do not expect many others to know this, that I then > > have to click into the "Search logs" box (not the regular web browser's > > search via Ctrl+F!) and type in "not ok" to find the log of the failed test > > case (and this might still be a "known broken" one that is marked via > > test_expect_failure and once again needs to be ignored). > > > > To be excessively clear: This is not a great experience! > > > > > > Improved output > > =============== > > > > Our previous Azure Pipelines-based CI builds had a much nicer UI, one that > > even showed flaky tests, and trends e.g. how long the test cases ran. When I > > ported Git's CI over to GitHub workflows (to make CI more accessible to new > > contributors), I knew fully well that we would leave this very nice UI > > behind, and I had hoped that we would get something similar back via new, > > community-contributed GitHub Actions that can be used in GitHub workflows. > > However, most likely because we use a home-grown test framework implemented > > in opinionated POSIX shells scripts, that did not happen. > > > > So I had a look at what standards exist e.g. when testing PowerShell > > modules, in the way of marking up their test output in GitHub workflows, and > > I was not disappointed: GitHub workflows support "grouping" of output lines, > > i.e. marking sections of the output as a group that is then collapsed by > > default and can be expanded. And it is this feature I decided to use in this > > patch series, along with GitHub workflows' commands to display errors or > > notices that are also shown on the summary page of the workflow run. Now, in > > addition to "Process completed with exit code" on the summary page, we also > > read something like: > > > > ⊗ linux-gcc (ubuntu-latest) > > failed: t9800.20 submit from detached head > > > > > > Even better, this message is a link, and following that, the reader is > > presented with something like this > > [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]: > > > > ⏵ Run ci/run-build-and-tests.sh > > ⏵ CI setup > > + ln -s /home/runner/none/.prove t/.prove > > + run_tests=t > > + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main > > + group Build make > > + set +x > > ⏵ Build > > ⏵ Run tests > > === Failed test: t9800-git-p4-basic === > > ⏵ ok: t9800.1 start p4d > > ⏵ ok: t9800.2 add p4 files > > ⏵ ok: t9800.3 basic git p4 clone > > ⏵ ok: t9800.4 depot typo error > > ⏵ ok: t9800.5 git p4 clone @all > > ⏵ ok: t9800.6 git p4 sync uninitialized repo > > ⏵ ok: t9800.7 git p4 sync new branch > > ⏵ ok: t9800.8 clone two dirs > > ⏵ ok: t9800.9 clone two dirs, @all > > ⏵ ok: t9800.10 clone two dirs, @all, conflicting files > > ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit > > ⏵ ok: t9800.12 clone using non-numeric revision ranges > > ⏵ ok: t9800.13 clone with date range, excluding some changes > > ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output > > ⏵ ok: t9800.15 exit gracefully for p4 server errors > > ⏵ ok: t9800.16 clone --bare should make a bare repository > > ⏵ ok: t9800.17 initial import time from top change time > > ⏵ ok: t9800.18 unresolvable host in P4PORT should display error > > ⏵ ok: t9800.19 run hook p4-pre-submit before submit > > Error: failed: t9800.20 submit from detached head > > ⏵ failure: t9800.20 submit from detached head > > Error: failed: t9800.21 submit from worktree > > ⏵ failure: t9800.21 submit from worktree > > === Failed test: t9801-git-p4-branch === > > [...] > > > > > > The "Failed test:" lines are colored in yellow to give a better visual clue > > about the logs' structure, the "Error:" label is colored in red to draw the > > attention to the important part of the log, and the "⏵" characters indicate > > that part of the log is collapsed and can be expanded by clicking on it. > > > > To drill down, the reader merely needs to expand the (failed) test case's > > log by clicking on it, and then study the log. If needed (e.g. when the test > > case relies on side effects from previous test cases), the logs of preceding > > test cases can be expanded as well. In this example, when expanding > > t9800.20, it looks like this (for ease of reading, I cut a few chunks of > > lines, indicated by "[...]"): > > > > [...] > > ⏵ ok: t9800.19 run hook p4-pre-submit before submit > > Error: failed: t9800.20 submit from detached head > > ⏷ failure: t9800.20 submit from detached head > > test_when_finished cleanup_git && > > git p4 clone --dest="$git" //depot && > > ( > > cd "$git" && > > git checkout p4/master && > > >detached_head_test && > > git add detached_head_test && > > git commit -m "add detached_head" && > > git config git-p4.skipSubmitEdit true && > > git p4 submit && > > git p4 rebase && > > git log p4/master | grep detached_head > > ) > > [...] > > Depot paths: //depot/ > > Import destination: refs/remotes/p4/master > > > > Importing revision 9 (100%)Perforce db files in '.' will be created if missing... > > Perforce db files in '.' will be created if missing... > > > > Traceback (most recent call last): > > File "/home/runner/work/git/git/git-p4", line 4455, in <module> > > main() > > File "/home/runner/work/git/git/git-p4", line 4449, in main > > if not cmd.run(args): > > File "/home/runner/work/git/git/git-p4", line 2590, in run > > rebase.rebase() > > File "/home/runner/work/git/git/git-p4", line 4121, in rebase > > if len(read_pipe("git diff-index HEAD --")) > 0: > > File "/home/runner/work/git/git/git-p4", line 297, in read_pipe > > retcode, out, err = read_pipe_full(c, *k, **kw) > > File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full > > p = subprocess.Popen( > > File "/usr/lib/python3.8/subprocess.py", line 858, in __init__ > > self._execute_child(args, executable, preexec_fn, close_fds, > > File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child > > raise child_exception_type(errno_num, err_msg, err_filename) > > FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --' > > error: last command exited with $?=1 > > + cleanup_git > > + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + nr_tries_left=60 > > + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + test 1 -ne 1 > > + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + nr_tries_left=60 > > + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + exit 1 > > + eval_ret=1 > > + : > > not ok 20 - submit from detached head > > # > > # test_when_finished cleanup_git && > > # git p4 clone --dest="$git" //depot && > > # ( > > # cd "$git" && > > # git checkout p4/master && > > # >detached_head_test && > > # git add detached_head_test && > > # git commit -m "add detached_head" && > > # git config git-p4.skipSubmitEdit true && > > # git p4 submit && > > # git p4 rebase && > > # git log p4/master | grep detached_head > > # ) > > # > > Error: failed: t9800.21 submit from worktree > > [...] > > > > > > Is this the best UI we can have for test failures in CI runs? I hope we can > > do better. Having said that, this patch series presents a pretty good start, > > and offers a basis for future improvements. > > > > I think these are really valuable improvements over our current state, but I > also understand the concerns about performance elsewhere in this thread > (it's really slow to load for me as well, and scrolling/expanding the log > groups can be a bit glitchy in my browser). That said, I think there are a > couple ways you could improve the load time without sacrificing the (very > helpful) improvements you've made to error log visibility. I experimented a > bit (example result [1]) and came up with some things that may help: > > * group errors by test file, rather than by test case (to reduce > parsing/rendering time for lots of groups). I really would like to avoid that, based on my past experience with diagnosing test failures. It is definitely helpful if the structure lets the reader expand individual test cases. > * print the verbose logs only for the failed test cases (to massively cut > down on the size of the log, particularly when there's only a couple > failures in a test file with a lot of passing tests). That's an amazingly simple trick to improve the speed by a ton, indeed. Thank you for this splendid idea! > * skip printing the full text of the test in 'finalize_test_case_output' > when creating the group, i.e., use '$1' instead of '$*' (in both passing > and failing tests, this information is already printed via some other > means). > > If you wanted to make sure a user could still access the full failure logs > (i.e., including the "ok" test results), you could print a link to the > artifacts page as well - that way, all of the information we currently > provide to users can still be found somewhere. That's a good point, I added that hint to the output (the link is unfortunately not available at the time we print that advice). > > [1] https://github.com/vdye/git/runs/5666973267 > > > Johannes Schindelin (9): > > ci: fix code style > > ci/run-build-and-tests: take a more high-level view > > ci: make it easier to find failed tests' logs in the GitHub workflow > > ci/run-build-and-tests: add some structure to the GitHub workflow > > output > > tests: refactor --write-junit-xml code > > test(junit): avoid line feeds in XML attributes > > ci: optionally mark up output in the GitHub workflow > > ci: use `--github-workflow-markup` in the GitHub workflow > > ci: call `finalize_test_case_output` a little later > > > > The organization of these commits makes the series a bit confusing to read, > mainly due to the JUnit changes in the middle. Patches 5-6 don't appear to > be dependent on patches 1-4, so they could be moved to the beginning (after > patch 1). With that change, I think this series would flow more smoothly: > "Cleanup/non-functional change" -> "JUnit XML improvements" -> "Log UX > improvements". Great feedback! I changed the order as suggested. Again, thank you so much for helping me improve the user experience of Git's CI/PR builds. Ciao, Dscho