Hi Adam, On Thu, 4 Nov 2021, Adam Dinwoodie wrote: > On Monday 20 September 2021 at 03:36 pm +0000, Jeff Hostetler via GitGitGadget wrote: > > From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > > > > Convert test helper to use `start_bg_command()` when spawning a server > > daemon in the background rather than blocks of platform-specific code. > > > > Also, while here, remove _() translation around error messages since > > this is a test helper and not Git code. > > As part of testing the v2.34.0-rc0 release candidate on Cygwin, I've > found this commit -- 05881a6fc9 (t/helper/simple-ipc: convert > test-simple-ipc to use start_bg_command, 2021-09-20), according to my > bisecting -- is causing t0052.1 to fail on 32-bit Cygwin. Somehow this > is only affecting the 32-bit Cygwin build; the 64-bit build is working > as expected. > > Specifically, the failure I'm seeing is as below: > > ``` > $ GIT_TRACE=1 sh t0052-simple-ipc.sh -vix > trace: built-in: git init '/cygdrive/d/git-32/t/trash directory.t0052-simple-ipc' > Initialized empty Git repository in /cygdrive/d/git-32/t/trash directory.t0052-simple-ipc/.git/ > expecting success of 0052.1 'start simple command server': > test_atexit stop_simple_IPC_server && > test-tool simple-ipc start-daemon --threads=8 && > test-tool simple-ipc is-active > > ++ test_atexit stop_simple_IPC_server > ++ test 0 = 0 > ++ test_atexit_cleanup='{ stop_simple_IPC_server > } && (exit "$eval_ret"); eval_ret=$?; :' > ++ test-tool simple-ipc start-daemon --threads=8 > trace: run_command: test-tool simple-ipc run-daemon --name=ipc-test --threads=8 > error: daemon failed to start > error: last command exited with $?=1 > not ok 1 - start simple command server > # > # test_atexit stop_simple_IPC_server && > # test-tool simple-ipc start-daemon --threads=8 && > # test-tool simple-ipc is-active > # > ++ stop_simple_IPC_server > ++ test-tool simple-ipc stop-daemon > ++ exit 1 > ++ eval_ret=1 > ++ : > ``` > > I've had a look at the code changes, and cannot work out what might be > being handled differently in 32-bit and 64-bit Cygwin environments. > Given the Cygwin project is considering dropping support for 32-bit > Cygwin anyway, it might not be worth doing anything about this. But I > thought it worth reporting in case there's something obvious to folk > more familiar with this code. I had a look at this and could reproduce... partially. I only managed to make it fail every once in a while. Digging deeper, it turns out that the `lstat()` call in `ipc_get_active_state()` does not receive an `st_mode` indicating a socket, but rather a file (in my tests, it was usually 0100644, but sometimes even 0100755). The reason is, of course, that Cygwin _emulates_ Unix sockets. What is in the file system is just a special file, it is marked with the `system` bit (which only exists on Windows), and its contents start with the tell-tale `!<socket>`. And as you might have guessed, there is a race going on between Cygwin writing that file _and_ flipping that `system` bit, and Git trying to access the Unix socket and encountering an unexpected file. Now, why this only happens in your 32-bit setup, I have no idea. In my tests, the following patch works around the issue. Could I ask you to test it in your environment? -- snip -- diff --git a/compat/simple-ipc/ipc-unix-socket.c b/compat/simple-ipc/ipc-unix-socket.c index 4e28857a0a..1c591b2adf 100644 --- a/compat/simple-ipc/ipc-unix-socket.c +++ b/compat/simple-ipc/ipc-unix-socket.c @@ -36,6 +36,23 @@ enum ipc_active_state ipc_get_active_state(const char *path) } /* also complain if a plain file is in the way */ +#ifdef __CYGWIN__ + { + static const int delay[] = { 1, 10, 20, 40, -1 }; + int i; + + for (i = 0; S_ISREG(st.st_mode) && delay[i] > 0; i++) { + /* + * Cygwin might still be in the process of marking the + * underlying file as a system file. + */ + sleep_millisec(delay[i]); + if (lstat(path, &st) == -1) + return IPC_STATE__INVALID_PATH; + } + } +#endif + if ((st.st_mode & S_IFMT) != S_IFSOCK) return IPC_STATE__INVALID_PATH; -- snap -- FWIW it looks as if the loop might be a bit of an overkill, as I could not get the code to need more than a single one-millisecond sleep, but it's probably safer to just keep the delay loop in place as-is. Ciao, Dscho