Hi Luke, I saw rare failures in test 6 'git p4 sync uninitialized repo' in 't9800-git-p4-basic.sh' on Travis CI, because the 'cleanup_git' function failed to do its job. The (redacted) trace looks like this: + cleanup_git + retry_until_success rm -r /home/szeder/src/git/t/trash directory.t9800-git-p4-basic/git + time_in_seconds + cd / + /usr/bin/python -c import time; print(int(time.time())) + timeout=1551233042 + rm -r /home/szeder/src/git/t/trash directory.t9800-git-p4-basic/git + test_must_fail test -d /home/szeder/src/git/t/trash directory.t9800-git-p4-basic/git test_must_fail: command succeeded: test -d /home/szeder/src/git/t/trash directory.t9800-git-p4-basic/git + eval_ret=1 + : not ok 6 - git p4 sync uninitialized repo Trying to reproduce this failure with stock Git can be tricky: I've seen ./t9800-git-p4-basic.sh --stress -r 1,2,6,22 fail within the first 10 tries, but it also run overnight without a single failure... However, the following two-liner patch can reliably trigger this failure: diff --git a/fast-import.c b/fast-import.c index b7ba755c2b..54715018b3 100644 --- a/fast-import.c +++ b/fast-import.c @@ -3296,6 +3296,7 @@ int cmd_main(int argc, const char **argv) rc_free[i].next = &rc_free[i + 1]; rc_free[cmd_save - 1].next = NULL; + sleep(1); start_packfile(); set_die_routine(die_nicely); set_checkpoint_signal(); diff --git a/t/lib-git-p4.sh b/t/lib-git-p4.sh index b3be3ba011..2d2ef50bfa 100644 --- a/t/lib-git-p4.sh +++ b/t/lib-git-p4.sh @@ -190,6 +190,7 @@ kill_p4d () { cleanup_git () { retry_until_success rm -r "$git" + sleep 2 test_must_fail test -d "$git" && retry_until_success mkdir "$git" } What's going on is this: 'git p4' invokes 'git fast-import' via Python's subprocess.Popen(), and then there are two chain of events racing with each other: - In the foreground: - 'git p4' notices that there are no p4 branches and die()s as expected, but leaves its child fast-import behind - 'test_i18ngrep' quickly verifies that 'git p4' died with the expected error message - the cleanup function removes the "$TRASH_DIRECTORY/git" directory, and - checks that the directory is indeed gone. - Meanwhile in the background: - 'git fast-import' starts up, kicks off the dashed external 'git-fast-import' process, - which opens a tmp packfile in "$TRASH_DIRECTORY/git" for writing (the start_packfile() call in the patch context above), creating any leading directories if necessary, - notices that its stdin is closed (because 'git p4' died) and it has nothing left to do, so - it cleans up and exits. Note that this cleanup only removes the tmp packfile, but leaves any newly created leading directories behind. While unlikely, it does apparently happen from time to time that the test's cleanup function first removes "$TRASH_DIRECTORY/git", but then 'git fast-import' re-creates it for its packfile before the cleanup function checks the result of the removal. The two well-placed sleep()s in the patch above force such a problematic scheduling. There are 4 test cases running 'test_must_fail git p4 sync': the above patch triggers a failure in 3 of them, and with a bit of extra modding I could trigger a failure in the fourth one as well. We could work this around by waiting for 'git p4' and all its child processes in the affected tests themselves, using the same shell trickery as in ef09036cf3 (t6500: wait for detached auto gc at the end of the test script, 2017-04-13), but this feels like, well, a workaround. I think the proper solution would be to ensure that 'git p4' kills and waits for all its child processes when die()ing. Alternatively (or in addition?), it could perform all the necessary sanity-checking (and potential die()ing) before starting the 'git fast-import' process in the first place. I've glanced through all subprocess.Popen() callsites in 'git p4' and found most of them OK, in the sense that they wait for whatever child process they create. Alas, there was one exception: p4CmdList() can invoke an optional callback function before wait()ing on its 'p4' child, and the streamP4FilesCb() callback function can die() without waiting for the 'p4' process (but it does wait() for 'git fast-import'!). On a related note: this check for the just-removed directory was added in 23aee4199a (git-p4: retry kill/cleanup operations in tests with timeout, 2015-11-19), which mentions flaky cleanup operations. Perhaps this is the same flakiness?! Anyway, as I mentioned elsewhere before, I have no idea why/how 'git p4' works, so I'll leave it up to you how it's best to deal with this issue... Gábor