On 8/22/22 9:22 AM, Johannes Schindelin wrote:
Hi Junio,
On Sat, 20 Aug 2022, Junio C Hamano wrote:
As a first step towards enabling fsmonitor to work against
network-mounted repos, a configuration option, 'fsmonitor.allowRemote'
was introduced for Windows. Setting this option to true will override
the default behavior (erroring-out) when a network-mounted repo is
detected by fsmonitor. In order for macOS to have parity with Windows,
the same option is now introduced for macOS.
With this merged in, recent CI runs for 'seen'
e.g. https://github.com/git/git/actions/runs/2892889122
seems to break macOS jobs, letting them hog CPU forever and exceed
6hr or whatever the limit is.
As an experiment I pushed out 'seen' but without this commit (not
the entire topic is excluded, the Windows one is still included).
As there is nothing specific to macOS between 'next' and 'seen',
macOS jobs seem to pass, which is not very surprising.
https://github.com/git/git/actions/runs/2896207171
As the patch collected some review comments, I've already marked it
in the "What's cooking" draft as expecting a reroll of that step;
until that happens, let's keep it out of 'seen'.
It makes sense to keep it out of `seen`, and at the same time I would like
to encourage Eric to investigate what causes those time-outs.
When toggling timestamps (click on the wheel on the upper right) at
https://github.com/git/git/runs/7927812510?check_suite_focus=true#step:4:1774,
it can be seen that at close to 1am, t9903 finished, but then nothing
happens until twenty past 6am.
I've downloaded the raw logs (also available via the wheel on the upper
right) to find out which test timed out:
$ diff -u \
<(sed -n 's/.*\] \(t[0-9][^ ]*\).*/\1/p' <~/Downloads/17 | sort) \
<(git ls-tree upstream/seen:t | cut -c 54- | grep '^t[0-9].*-.*sh$')
--- /dev/fd/63 2022-08-22 14:56:05.510269527 +0200
+++ /dev/fd/62 2022-08-22 14:56:05.510269527 +0200
@@ -794,6 +794,7 @@
t7524-commit-summary.sh
t7525-status-rename.sh
t7526-commit-pathspec-file.sh
+t7527-builtin-fsmonitor.sh
t7528-signed-commit-ssh.sh
t7600-merge.sh
t7601-merge-pull-config.sh
@@ -945,6 +946,7 @@
t9812-git-p4-wildcards.sh
t9813-git-p4-preserve-users.sh
t9814-git-p4-rename.sh
+t9815-git-p4-submit-fail.sh
t9816-git-p4-locked.sh
t9817-git-p4-exclude.sh
t9818-git-p4-block.sh
@@ -964,5 +966,8 @@
t9832-unshelve.sh
t9833-errors.sh
t9834-git-p4-file-dir-bug.sh
+t9835-git-p4-metadata-encoding-python2.sh
+t9836-git-p4-metadata-encoding-python3.sh
t9901-git-web--browse.sh
+t9902-completion.sh
t9903-bash-prompt.sh
I have no idea what's up with t98* and t9902, but I would bet that they
were somehow "swallowed" by `prove` being terminated, and that the
actual test script that times out is t7527.
Eric: To investigate, you will want to reproduce the problem on a macOS
machine. If you have none available, you could create a temporary branch,
heavily edit the CI definition, and push it to GitHub. And by heavy edits
I mean something like this:
- Remove all non-macOS jobs from `.github/workflows/main.yml` (that means
removing all but the `regular` job, removing all but at least one
`macos` matrix entry, and removing the the `needs: ci-config` and
corresponding `if:` line.
- Edit `t/Makefile` to define `T = t7527-builtin-fsmonitor.sh` instead of
running all the tests.
- Edit `.github/workflows/main.yml` so that the step that causes the
time-out has a chance of timing out much sooner (and the subsequent
steps then have a chance to upload the relevant logs):
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes
I would also set GIT_TRACE_FSMONITOR and GIT_TRACE2_PERF (on both daemon
and client sides of the tests) and capture the logs and try to figure
out what is happening.
I suspect that this testing should wait until you redo the patch to
remove the tmp file stuff and just move the socket into $HOME as we
talked about earlier.
Jeff