On Tue, Oct 1, 2024 at 4:46 AM Jeff King <peff@xxxxxxxx> wrote: > > I did some more digging on the hangs we sometimes see when running the > test suite on macOS. I'm cc-ing Patrick as somebody who dug into this > before, and Johannes as the only still-active person mentioned in the > relevant code. > > For those just joining, you can reproduce the issue by running t9211 > with --stress on macOS. Some earlier notes are here: > > https://lore.kernel.org/git/20240517081132.GA1517321@xxxxxxxxxxxxxxxxxxxxxxx/ > > but the gist of it is that we end up with Git processes waiting to read > from fsmonitor, but fsmonitor hanging. Perhaps I found the cause. fsmonitor_run_daemon_1() starts the fsevent listener thread before with_lock__wait_for_cookie() is called. /* * Start the fsmonitor listener thread to collect filesystem * events. */ if (pthread_create(&state->listener_thread, NULL, fsm_listen__thread_proc, state)) { ipc_server_stop_async(state->ipc_server_data); err = error(_("could not start fsmonitor listener thread")); goto cleanup; } listener_started = 1; fsm_listen__thread_proc() starts the following: fsm_listen__loop(state); which is defined as below for darwin: void fsm_listen__loop(struct fsmonitor_daemon_state *state) { struct fsm_listen_data *data; data = state->listen_data; pthread_mutex_init(&data->dq_lock, NULL); pthread_cond_init(&data->dq_finished, NULL); data->dq = dispatch_queue_create("FSMonitor", NULL); FSEventStreamSetDispatchQueue(data->stream, data->dq); data->stream_scheduled = 1; if (!FSEventStreamStart(data->stream)) { error(_("Failed to start the FSEventStream")); goto force_error_stop_without_loop; } data->stream_started = 1; ... Normally FSEventStreamStart() is called before with_lock__wait_for_cookie() creates a cookie file, but this is not guaranteed. We can reproduce the issue easily if we modify fsm_listen__loop() as below: --- a/compat/fsmonitor/fsm-listen-darwin.c +++ b/compat/fsmonitor/fsm-listen-darwin.c @@ -510,6 +510,7 @@ void fsm_listen__loop(struct fsmonitor_daemon_state *state) FSEventStreamSetDispatchQueue(data->stream, data->dq); data->stream_scheduled = 1; + sleep(1); if (!FSEventStreamStart(data->stream)) { error(_("Failed to start the FSEventStream")); goto force_error_stop_without_loop; Koji Nakamaru