Re: [PATCH v8 00/11] Git filter protocol

Jakub Narębski <jnareb@xxxxxxxxx> · Tue, 4 Oct 2016 21:04:55 +0200

W dniu 03.10.2016 o 19:13, Lars Schneider pisze: 
>> On 01 Oct 2016, at 22:48, Jakub Narębski <jnareb@xxxxxxxxx> wrote:
>> W dniu 01.10.2016 o 20:59, Lars Schneider pisze: 
>>> On 29 Sep 2016, at 23:27, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>> Lars Schneider <larsxschneider@xxxxxxxxx> writes:
>>>>
>>>> If the filter process refuses to die forever when Git told it to
>>>> shutdown (by closing the pipe to it, for example), that filter
>>>> process is simply buggy.  I think we want users to become aware of
>>>> that, instead of Git leaving it behind, which essentially is to
>>>> sweep the problem under the rug.
>>
>> Well, it would be good to tell users _why_ Git is hanging, see below.
> 
> Agreed. Do you think it is OK to write the message to stderr?

On the other hand, this is why GIT_TRACE (and GIT_TRACE_PERFORMANCE)
was invented for.  We do not signal troubles with single-shot filters,
so I guess doing it for multi-file filters is not needed.

>>>> I agree with what Peff said elsewhere in the thread; if a filter
>>>> process wants to take time to clean things up while letting Git
>>>> proceed, it can do its own process management, but I think it is
>>>> sensible for Git to wait the filter process it directly spawned.
>>>
>>> To realize the approach above I prototyped the run-command patch below:
>>>
>>> I added an "exit_timeout" variable to the "child_process" struct.
>>> On exit, Git will close the pipe to the process and wait "exit_timeout" 
>>> seconds until it kills the child process. If "exit_timeout" is negative
>>> then Git will wait until the process is done.
>>
>> That might be good approach.  Probably the default would be to wait.
> 
> I think I would prefer a 2sec timeout or something as default. This way
> we can ensure Git would not wait indefinitely for a buggy filter by default.

Actually this waiting for multi-file filter is only about waiting for
the shutdown process of the filter.  The filter could still hang during
processing a file, and git would hang too, if I understand it correctly.

[...]
>> Also, how would one set default value of timeout for all process
>> based filters?
> 
> I think we don't need that because a timeout is always specific
> to a filter (if the 2sec default is not sufficient).

All right (assuming that timeouts are good idea). 

>>>
>>> +			while ((waitpid(p->pid, &status, 0)) < 0 && errno == EINTR)
>>> +				;	/* nothing */
>>
>> Ah, this loop is here because waiting on waitpid() can be interrupted
>> by the delivery of a signal to the calling process; though the result
>> is -1, not just any < 0.
> 
> "< 0" is also used in wait_or_whine()

O.K. (though it doesn't necessary mean that it is correct, there
is another point for using "< 0").

[...]
>> There is also another complication: there can be more than one
>> long-running filter driver used.  With this implementation we
>> wait for each of one in sequence, e.g. 10s + 10s + 10s.
> 
> Good idea, I fixed that in the version below!
> 
[...]
> [...] this function is also used with the async struct... 

Hmmm... now I wonder if it is a good idea (similar treatment for
single-file async-invoked filter, and multi-file pkt-line filters).

For single-file one-shot filter (correct me if I am wrong):

 - git sends contents to filter, signals end with EOF
   (after process is started)
 - in an async process:
   - process is started
   - git reads contents from filter, until EOF
   - if process did not end, it is killed

For multi-process pkt-line based filter (simplified):

 - process is started
 - handshake
 - for each file
   - file is send to filter process over pkt-line,
     end signalled with flush packet
   - git reads from filter from pkt-line, until flush
 - ...

See how single-shot filter is sent EOF, though in different part
of code.  We need to signal multi-file filter that no more files
will be coming.  Simplest solution is to send EOF (we could send
"command=shutdown" for example...) to filter, and wait for EOF
from filter (or for "status=finished" and EOF).

We could kill multi-file filter after sending last file and
receiving full response... but I think single-shot filter gets
killed only because it allows for very simple filters, and reusing
existing commands as filters.

[...]
> diff --git a/run-command.c b/run-command.c
> index 3269362..ca0feef 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -21,6 +21,9 @@ void child_process_clear(struct child_process *child)
>  
>  struct child_to_clean {
>  	pid_t pid;
> +	char *name;

I guess it is here for output purposes?

Should we store full command here, or just name of <driver>?

> +	int stdin;

I guess the name `stdin` for file _descriptor_ is something
used in other parts of convert.c code, isn't it?

> +	int timeout;

Hmmm... we assume that timeout is in seconds, not millis or other
value, isn't it.  timeout_sec would perhaps be unnecessarily long.

>  	struct child_to_clean *next;
>  };
>  static struct child_to_clean *children_to_clean;
> @@ -28,12 +31,53 @@ static int installed_child_cleanup_handler;
>  
>  static void cleanup_children(int sig, int in_signal)
>  {
> +	int status;
> +	struct timeval tv;
> +	time_t secs;
> +	struct child_to_clean *p = children_to_clean;
> +
> +	// Send EOF to children as indicator that Git will exit soon
> +	while (p) {
> +		if (p->timeout != 0) {

Here we use timeout == 0 as a special case, a special indicator
(IIUC for the single-shot filter case, where it is closed already).
This is not documented.  Somebody setting timeout to "0" would
be surprised, isn't it?

> +			if (p->stdin > 0)
> +				close(p->stdin);
> +		}
> +		p = p->next;
> +	}
> +
>  	while (children_to_clean) {
> -		struct child_to_clean *p = children_to_clean;
> +		p = children_to_clean;
>  		children_to_clean = p->next;
> +
> +		if (p->timeout != 0) {
> +			fprintf(stderr, _("Waiting for '%s' to finish..."), p->name);
> +			if (p->timeout < 0) {
> +				// No timeout given - wait indefinitely
> +				while ((waitpid(p->pid, &status, 0)) < 0 && errno == EINTR)
> +					;	/* nothing */
> +			} else {
> +				// Wait until timeout
> +				gettimeofday(&tv, NULL);
> +				secs = tv.tv_sec;
> +				while (!waitpid(p->pid, &status, WNOHANG) &&
> +					   tv.tv_sec - secs < p->timeout) {
> +					fprintf(stderr, _(" \rWaiting %lds for '%s' to finish..."),
> +						p->timeout - tv.tv_sec + secs - 1, p->name);
> +					gettimeofday(&tv, NULL);
> +					sleep_millisec(10);
> +				}
> +			}

I wonder if we have some progress-printing code we can borrow
from, or just plain use (like progress report for long checkout).

> +			if (waitpid(p->pid, &status, WNOHANG))
> +				fprintf(stderr, _("done!\n"));
> +			else
> +				fprintf(stderr, _("timeout. Killing...\n"));
> +		}
> +
>  		kill(p->pid, sig);
> -		if (!in_signal)
> +		if (!in_signal) {
> +			free(p->name);
>  			free(p);
> +		}
>  	}
>  }
>  
> @@ -49,10 +93,18 @@ static void cleanup_children_on_exit(void)
>  	cleanup_children(SIGTERM, 0);
>  }
>  
> -static void mark_child_for_cleanup(pid_t pid)
> +static void mark_child_for_cleanup_with_timeout(pid_t pid, const char *name, int stdin, int timeout)
>  {
>  	struct child_to_clean *p = xmalloc(sizeof(*p));
>  	p->pid = pid;
> +	p->timeout = timeout;
> +	p->stdin = stdin;
> +	if (name) {
> +		p->name = xmalloc(strlen(name) + 1);
> +		strcpy(p->name, name);

Don't we have xstrdup() for that, or am I mistaken?

> +	} else {
> +		p->name = "process";

Hmmmm...

> +	}
>  	p->next = children_to_clean;
>  	children_to_clean = p;
>  
> @@ -63,6 +115,13 @@ static void mark_child_for_cleanup(pid_t pid)
>  	}
>  }
>  
> +#ifdef NO_PTHREADS
> +static void mark_child_for_cleanup(pid_t pid, const char *name, int timeout, int stdin)
> +{
> +	mark_child_for_cleanup_with_timeout(pid, NULL, 0, 0);
> +}
> +#endif

Uh?

> +
>  static void clear_child_for_cleanup(pid_t pid)
>  {
>  	struct child_to_clean **pp;
> @@ -422,7 +481,8 @@ int start_command(struct child_process *cmd)
>  	if (cmd->pid < 0)
>  		error_errno("cannot fork() for %s", cmd->argv[0]);
>  	else if (cmd->clean_on_exit)
> -		mark_child_for_cleanup(cmd->pid);
> +		mark_child_for_cleanup_with_timeout(
> +			cmd->pid, cmd->argv[0], cmd->in, cmd->clean_on_exit_timeout);

All right, nice abstraction.

>  
>  	/*
>  	 * Wait for child's execvp. If the execvp succeeds (or if fork()
> @@ -483,7 +543,8 @@ int start_command(struct child_process *cmd)
>  	if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
>  		error_errno("cannot spawn %s", cmd->argv[0]);
>  	if (cmd->clean_on_exit && cmd->pid >= 0)
> -		mark_child_for_cleanup(cmd->pid);
> +		mark_child_for_cleanup_with_timeout(
> +			cmd->pid, cmd->argv[0], cmd->in, cmd->clean_on_exit_timeout);
>  
>  	argv_array_clear(&nargv);
>  	cmd->argv = sargv;
> diff --git a/run-command.h b/run-command.h
> index cf29a31..4c1c1f4 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -43,6 +43,16 @@ struct child_process {
>  	unsigned stdout_to_stderr:1;
>  	unsigned use_shell:1;
>  	unsigned clean_on_exit:1;
> +	/*
> +	 * clean_on_exit_timeout is only considered if clean_on_exit is set.
> +	 * - Specify 0 to kill the child on Git exit (default)
> +	 * - Specify a negative value to close the child's stdin on Git exit
> +	 *   and wait indefinitely for the child's termination.
> +	 * - Specify a positive value to close the child's stdin on Git exit
> +	 *   and wait clean_on_exit_timeout seconds for the child's
> +	 *   termination.

All right, so here is this documentation...

> +	 */
> +	int clean_on_exit_timeout;
>  };
>  
>  #define CHILD_PROCESS_INIT { NULL, ARGV_ARRAY_INIT, ARGV_ARRAY_INIT }
> 
> 

For full patch, you would need also to add to Documentation/config.txt

Best,
-- 
Jakub Narębski