Re: fsmonitor deadlock / macOS CI hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 1, 2024 at 4:46 AM Jeff King <peff@xxxxxxxx> wrote:
>
> I did some more digging on the hangs we sometimes see when running the
> test suite on macOS. I'm cc-ing Patrick as somebody who dug into this
> before, and Johannes as the only still-active person mentioned in the
> relevant code.
>
> For those just joining, you can reproduce the issue by running t9211
> with --stress on macOS. Some earlier notes are here:
>
>   https://lore.kernel.org/git/20240517081132.GA1517321@xxxxxxxxxxxxxxxxxxxxxxx/
>
> but the gist of it is that we end up with Git processes waiting to read
> from fsmonitor, but fsmonitor hanging.

Perhaps I found the cause. fsmonitor_run_daemon_1() starts the fsevent
listener thread before with_lock__wait_for_cookie() is called.

      /*
       * Start the fsmonitor listener thread to collect filesystem
       * events.
       */
      if (pthread_create(&state->listener_thread, NULL,
                         fsm_listen__thread_proc, state)) {
              ipc_server_stop_async(state->ipc_server_data);
              err = error(_("could not start fsmonitor listener thread"));
              goto cleanup;
      }
      listener_started = 1;

fsm_listen__thread_proc() starts the following:

      fsm_listen__loop(state);

which is defined as below for darwin:

  void fsm_listen__loop(struct fsmonitor_daemon_state *state)
  {
          struct fsm_listen_data *data;

          data = state->listen_data;

          pthread_mutex_init(&data->dq_lock, NULL);
          pthread_cond_init(&data->dq_finished, NULL);
          data->dq = dispatch_queue_create("FSMonitor", NULL);

          FSEventStreamSetDispatchQueue(data->stream, data->dq);
          data->stream_scheduled = 1;

          if (!FSEventStreamStart(data->stream)) {
                  error(_("Failed to start the FSEventStream"));
                  goto force_error_stop_without_loop;
          }
          data->stream_started = 1;

          ...

Normally FSEventStreamStart() is called before
with_lock__wait_for_cookie() creates a cookie file, but this is not
guaranteed. We can reproduce the issue easily if we modify
fsm_listen__loop() as below:

  --- a/compat/fsmonitor/fsm-listen-darwin.c
  +++ b/compat/fsmonitor/fsm-listen-darwin.c
  @@ -510,6 +510,7 @@ void fsm_listen__loop(struct
fsmonitor_daemon_state *state)
          FSEventStreamSetDispatchQueue(data->stream, data->dq);
          data->stream_scheduled = 1;

  +       sleep(1);
          if (!FSEventStreamStart(data->stream)) {
                  error(_("Failed to start the FSEventStream"));
                  goto force_error_stop_without_loop;


Koji Nakamaru





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux