W dniu 03.10.2016 o 19:13, Lars Schneider pisze: >> On 01 Oct 2016, at 22:48, Jakub Narębski <jnareb@xxxxxxxxx> wrote: >> W dniu 01.10.2016 o 20:59, Lars Schneider pisze: >>> On 29 Sep 2016, at 23:27, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>>> Lars Schneider <larsxschneider@xxxxxxxxx> writes: >>>> >>>> If the filter process refuses to die forever when Git told it to >>>> shutdown (by closing the pipe to it, for example), that filter >>>> process is simply buggy. I think we want users to become aware of >>>> that, instead of Git leaving it behind, which essentially is to >>>> sweep the problem under the rug. >> >> Well, it would be good to tell users _why_ Git is hanging, see below. > > Agreed. Do you think it is OK to write the message to stderr? On the other hand, this is why GIT_TRACE (and GIT_TRACE_PERFORMANCE) was invented for. We do not signal troubles with single-shot filters, so I guess doing it for multi-file filters is not needed. >>>> I agree with what Peff said elsewhere in the thread; if a filter >>>> process wants to take time to clean things up while letting Git >>>> proceed, it can do its own process management, but I think it is >>>> sensible for Git to wait the filter process it directly spawned. >>> >>> To realize the approach above I prototyped the run-command patch below: >>> >>> I added an "exit_timeout" variable to the "child_process" struct. >>> On exit, Git will close the pipe to the process and wait "exit_timeout" >>> seconds until it kills the child process. If "exit_timeout" is negative >>> then Git will wait until the process is done. >> >> That might be good approach. Probably the default would be to wait. > > I think I would prefer a 2sec timeout or something as default. This way > we can ensure Git would not wait indefinitely for a buggy filter by default. Actually this waiting for multi-file filter is only about waiting for the shutdown process of the filter. The filter could still hang during processing a file, and git would hang too, if I understand it correctly. [...] >> Also, how would one set default value of timeout for all process >> based filters? > > I think we don't need that because a timeout is always specific > to a filter (if the 2sec default is not sufficient). All right (assuming that timeouts are good idea). >>> >>> + while ((waitpid(p->pid, &status, 0)) < 0 && errno == EINTR) >>> + ; /* nothing */ >> >> Ah, this loop is here because waiting on waitpid() can be interrupted >> by the delivery of a signal to the calling process; though the result >> is -1, not just any < 0. > > "< 0" is also used in wait_or_whine() O.K. (though it doesn't necessary mean that it is correct, there is another point for using "< 0"). [...] >> There is also another complication: there can be more than one >> long-running filter driver used. With this implementation we >> wait for each of one in sequence, e.g. 10s + 10s + 10s. > > Good idea, I fixed that in the version below! > [...] > [...] this function is also used with the async struct... Hmmm... now I wonder if it is a good idea (similar treatment for single-file async-invoked filter, and multi-file pkt-line filters). For single-file one-shot filter (correct me if I am wrong): - git sends contents to filter, signals end with EOF (after process is started) - in an async process: - process is started - git reads contents from filter, until EOF - if process did not end, it is killed For multi-process pkt-line based filter (simplified): - process is started - handshake - for each file - file is send to filter process over pkt-line, end signalled with flush packet - git reads from filter from pkt-line, until flush - ... See how single-shot filter is sent EOF, though in different part of code. We need to signal multi-file filter that no more files will be coming. Simplest solution is to send EOF (we could send "command=shutdown" for example...) to filter, and wait for EOF from filter (or for "status=finished" and EOF). We could kill multi-file filter after sending last file and receiving full response... but I think single-shot filter gets killed only because it allows for very simple filters, and reusing existing commands as filters. [...] > diff --git a/run-command.c b/run-command.c > index 3269362..ca0feef 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -21,6 +21,9 @@ void child_process_clear(struct child_process *child) > > struct child_to_clean { > pid_t pid; > + char *name; I guess it is here for output purposes? Should we store full command here, or just name of <driver>? > + int stdin; I guess the name `stdin` for file _descriptor_ is something used in other parts of convert.c code, isn't it? > + int timeout; Hmmm... we assume that timeout is in seconds, not millis or other value, isn't it. timeout_sec would perhaps be unnecessarily long. > struct child_to_clean *next; > }; > static struct child_to_clean *children_to_clean; > @@ -28,12 +31,53 @@ static int installed_child_cleanup_handler; > > static void cleanup_children(int sig, int in_signal) > { > + int status; > + struct timeval tv; > + time_t secs; > + struct child_to_clean *p = children_to_clean; > + > + // Send EOF to children as indicator that Git will exit soon > + while (p) { > + if (p->timeout != 0) { Here we use timeout == 0 as a special case, a special indicator (IIUC for the single-shot filter case, where it is closed already). This is not documented. Somebody setting timeout to "0" would be surprised, isn't it? > + if (p->stdin > 0) > + close(p->stdin); > + } > + p = p->next; > + } > + > while (children_to_clean) { > - struct child_to_clean *p = children_to_clean; > + p = children_to_clean; > children_to_clean = p->next; > + > + if (p->timeout != 0) { > + fprintf(stderr, _("Waiting for '%s' to finish..."), p->name); > + if (p->timeout < 0) { > + // No timeout given - wait indefinitely > + while ((waitpid(p->pid, &status, 0)) < 0 && errno == EINTR) > + ; /* nothing */ > + } else { > + // Wait until timeout > + gettimeofday(&tv, NULL); > + secs = tv.tv_sec; > + while (!waitpid(p->pid, &status, WNOHANG) && > + tv.tv_sec - secs < p->timeout) { > + fprintf(stderr, _(" \rWaiting %lds for '%s' to finish..."), > + p->timeout - tv.tv_sec + secs - 1, p->name); > + gettimeofday(&tv, NULL); > + sleep_millisec(10); > + } > + } I wonder if we have some progress-printing code we can borrow from, or just plain use (like progress report for long checkout). > + if (waitpid(p->pid, &status, WNOHANG)) > + fprintf(stderr, _("done!\n")); > + else > + fprintf(stderr, _("timeout. Killing...\n")); > + } > + > kill(p->pid, sig); > - if (!in_signal) > + if (!in_signal) { > + free(p->name); > free(p); > + } > } > } > > @@ -49,10 +93,18 @@ static void cleanup_children_on_exit(void) > cleanup_children(SIGTERM, 0); > } > > -static void mark_child_for_cleanup(pid_t pid) > +static void mark_child_for_cleanup_with_timeout(pid_t pid, const char *name, int stdin, int timeout) > { > struct child_to_clean *p = xmalloc(sizeof(*p)); > p->pid = pid; > + p->timeout = timeout; > + p->stdin = stdin; > + if (name) { > + p->name = xmalloc(strlen(name) + 1); > + strcpy(p->name, name); Don't we have xstrdup() for that, or am I mistaken? > + } else { > + p->name = "process"; Hmmmm... > + } > p->next = children_to_clean; > children_to_clean = p; > > @@ -63,6 +115,13 @@ static void mark_child_for_cleanup(pid_t pid) > } > } > > +#ifdef NO_PTHREADS > +static void mark_child_for_cleanup(pid_t pid, const char *name, int timeout, int stdin) > +{ > + mark_child_for_cleanup_with_timeout(pid, NULL, 0, 0); > +} > +#endif Uh? > + > static void clear_child_for_cleanup(pid_t pid) > { > struct child_to_clean **pp; > @@ -422,7 +481,8 @@ int start_command(struct child_process *cmd) > if (cmd->pid < 0) > error_errno("cannot fork() for %s", cmd->argv[0]); > else if (cmd->clean_on_exit) > - mark_child_for_cleanup(cmd->pid); > + mark_child_for_cleanup_with_timeout( > + cmd->pid, cmd->argv[0], cmd->in, cmd->clean_on_exit_timeout); All right, nice abstraction. > > /* > * Wait for child's execvp. If the execvp succeeds (or if fork() > @@ -483,7 +543,8 @@ int start_command(struct child_process *cmd) > if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT)) > error_errno("cannot spawn %s", cmd->argv[0]); > if (cmd->clean_on_exit && cmd->pid >= 0) > - mark_child_for_cleanup(cmd->pid); > + mark_child_for_cleanup_with_timeout( > + cmd->pid, cmd->argv[0], cmd->in, cmd->clean_on_exit_timeout); > > argv_array_clear(&nargv); > cmd->argv = sargv; > diff --git a/run-command.h b/run-command.h > index cf29a31..4c1c1f4 100644 > --- a/run-command.h > +++ b/run-command.h > @@ -43,6 +43,16 @@ struct child_process { > unsigned stdout_to_stderr:1; > unsigned use_shell:1; > unsigned clean_on_exit:1; > + /* > + * clean_on_exit_timeout is only considered if clean_on_exit is set. > + * - Specify 0 to kill the child on Git exit (default) > + * - Specify a negative value to close the child's stdin on Git exit > + * and wait indefinitely for the child's termination. > + * - Specify a positive value to close the child's stdin on Git exit > + * and wait clean_on_exit_timeout seconds for the child's > + * termination. All right, so here is this documentation... > + */ > + int clean_on_exit_timeout; > }; > > #define CHILD_PROCESS_INIT { NULL, ARGV_ARRAY_INIT, ARGV_ARRAY_INIT } > > For full patch, you would need also to add to Documentation/config.txt Best, -- Jakub Narębski