Re: [PATCH 6/7] run-command: create start_bg_command

Taylor Blau <me@xxxxxxxxxxxx> · Thu, 16 Sep 2021 00:53:07 -0400

On Wed, Sep 15, 2021 at 08:36:16PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
>
> Create a variation of `run_command()` and `start_command()` to launch a command
> into the background and optionally wait for it to become "ready" before returning.
>
> Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
> ---
>  run-command.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  run-command.h |  48 ++++++++++++++++++++
>  2 files changed, 171 insertions(+)
>
> diff --git a/run-command.c b/run-command.c
> index 3e4e082e94d..fe75fd08f74 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1901,3 +1901,126 @@ void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
>  	}
>  	strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
>  }
> +
> +enum start_bg_result start_bg_command(struct child_process *cmd,
> +				      start_bg_wait_cb *wait_cb,
> +				      void *cb_data,
> +				      unsigned int timeout_sec)
> +{
> +	enum start_bg_result sbgr = SBGR_ERROR;
> +	int ret;
> +	int wait_status;
> +	pid_t pid_seen;
> +	time_t time_limit;
> +
> +	/*
> +	 * Silently disallow child cleanup -- even if requested.
> +	 * The child process should persist in the background
> +	 * and possibly/probably after this process exits.  That
> +	 * is, don't kill the child during our atexit routine.
> +	 */
> +	cmd->clean_on_exit = 0;
> +
> +	ret = start_command(cmd);
> +	if (ret) {
> +		/*
> +		 * We assume that if `start_command()` fails, we
> +		 * either get a complete `trace2_child_start() /
> +		 * trace2_child_exit()` pair or it fails before the
> +		 * `trace2_child_start()` is emitted, so we do not
> +		 * need to worry about it here.
> +		 *
> +		 * We also assume that `start_command()` does not add
> +		 * us to the cleanup list.  And that it calls
> +		 * calls `child_process_clear()`.
> +		 */
> +		sbgr = SBGR_ERROR;
> +		goto done;
> +	}
> +
> +	time(&time_limit);
> +	time_limit += timeout_sec;

This jumped out to me as unsafe, since POSIX guarantees time_t to be an
integral value holding a number of seconds (so += timeout_sec is safe
there), but it isn't in the C standard.

But we have lots of other examples of adding a number of seconds
directly the value filled in by time(2), so I think this is fine.

> +
> +wait:
> +	pid_seen = waitpid(cmd->pid, &wait_status, WNOHANG);
> +
> +	if (pid_seen == 0) {

Small nit, probably better to write this as if (!pid_seen), but not
worth a reroll alone.

> +		/*
> +		 * The child is currently running.  Ask the callback
> +		 * if the child is ready to do work or whether we
> +		 * should keep waiting for it to boot up.
> +		 */

This comment is simple and informative, thank you!

> +		ret = (*wait_cb)(cb_data, cmd);
> +		if (!ret) {
> +			/*
> +			 * The child is running and "ready".
> +			 *
> +			 * NEEDSWORK: As we prepare to orphan (release to
> +			 * the background) this child, it is not appropriate
> +			 * to emit a `trace2_child_exit()` event.  Should we
> +			 * create a new event for this case?

Probably. Maybe trace2_child_orphaned() or trace2_child_background()?

> +			 */
> +			sbgr = SBGR_READY;
> +			goto done;
> +		} else if (ret > 0) {
> +			time_t now;
> +
> +			time(&now);
> +			if (now < time_limit)
> +				goto wait;
> +
> +			/*
> +			 * Our timeout has expired.  We don't try to
> +			 * kill the child, but rather let it continue
> +			 * (hopefully) trying to startup.
> +			 *
> +			 * NEEDSWORK: Like the "ready" case, should we
> +			 * log a custom child-something Trace2 event here?
> +			 */
> +			sbgr = SBGR_TIMEOUT;
> +			goto done;
> +		} else {
> +			/*
> +			 * The cb gave up on this child.
> +			 *
> +			 * NEEDSWORK: Like above, should we log a custom
> +			 * Trace2 child-something event here?
> +			 */
> +			sbgr = SBGR_CB_ERROR;
> +			goto done;
> +		}

OK, so assuming that the child is running, then we ask wait_cb what to
do. Returning zero from the callback means to background it, a positive
value means to give it more time, and negative means to cause an error.
And those match the documentation below, good.

> +	if (pid_seen == cmd->pid) {

This could be an "else if", no?

> +		int child_code = -1;
> +
> +		/*
> +		 * The child started, but exited or was terminated
> +		 * before becoming "ready".
> +		 *
> +		 * We try to match the behavior of `wait_or_whine()`
> +		 * and convert the child's status to a return code for
> +		 * tracing purposes and emit the `trace2_child_exit()`
> +		 * event.
> +		 */
> +		if (WIFEXITED(wait_status))
> +			child_code = WEXITSTATUS(wait_status);
> +		else if (WIFSIGNALED(wait_status))
> +			child_code = WTERMSIG(wait_status) + 128;

Do we care about emitting the same error (when it was signaled with
something other than SIGINT/SIGQUIT/SIGPIPE) as is reported by
wait_or_whine()?

If we want that error here, too, we could probably share the same code
here from here and in wait_or_whine(). I would probably write something
like:

    static int handle_awaited_status(int status, int *code)
    {
      if (WIFSIGNALED(status)) {
        *code = WTERMSIG(status);
        if (*code != SIGINT && *code != SIGQUIT && *code != SIGPIPE)
               error("%s died of signal %d", argv0, *code);
        /*
         * This return value is chosen so that code & 0xff
         * mimics the exit code that a POSIX shell would report for
         * a program that died from this signal.
         */
        *code += 128;
        return 1;
      } else if (WIFEXITED(status)) {
        *code = WEXITSTATUS(status);
        return 1;
      }
      return 0;
    }

so that we could call it in wait_or_whine() like:

    } else if (!handle_awaited_status(status, &code)) {
      error("waitpid is confused (%s)", argv0);
    }

and similarly here in this new function. Alternatively, if we don't want
that error, then it may help future readers to add a short comment
explaining why not.

> +/**
> + * Callback used by `start_bg_command()` to ask whether the
> + * child process is ready or needs more time to become ready.
> + *
> + * Returns 1 is child needs more time (subject to the requested timeout).
> + * Returns 0 if child is ready.
> + * Returns -1 on any error and cause `start_bg_command()` to also error out.
> + */
> +typedef int(start_bg_wait_cb)(void *cb_data,
> +			      const struct child_process *cmd);

Nitpicking, but typically I would assume that the "extra" void pointer
is the last argument in a callback. It definitely does not matter,
though.

Thanks,
Taylor