Re: [PATCH] add post-fetch hook

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 25 Dec 2011 23:59:46 -0800

Joey Hess <joey@xxxxxxxxxxx> writes:

> The post-fetch hook is fed lines on stdin for all refs that were fetched, and
> outputs on stdout possibly modified lines. Its output is parsed and used
> when git fetch updates the remote tracking refs, records the entries in
> FETCH_HEAD, and produces its report.
>
> ---
>
> Not quite ready to sign off on this yet, but it does work.
> Comments and code review appreciated.

Thanks. I do have a few comments.

This hook is no longer a "post" fetch hook. The mechanism lets the object
transfer phase does its work and then rewrites/tweaks the result before
fetch completes. To an outside observer, what the hook does is an integral
part of what "fetch" does, and not something that happens _after_ fetch
completes. I am bad at naming things, but something along the lines of
"tweak-fetch" that makes it clear that what happens in the hook is still
part of the fetching may be a more appropriate name, methinks.

I very much on purpose said that the hook "must read everything from its
standard input, *and* *then* return..." in my response. Your "Demo" hook
emits output as it reads its input with sed, but your main process invokes
the hook, drains everything with write_in_full() before starting to read a
single line, so I suspect that your hook will deadlock when its output
pipe buffer fills up without being read by the main process. Of course,
for this deadlock to actually happen, you need to be fetching quite a lot
of refs.

To make the life of hook writers easier, however, it would be good to
support a hook written in the style of your "Demo" hook, instead of
requiring it to read everything before emiting any output. I think you
could solve this by having a select(2) loop to avoid the deadlock on our
end (lets call the code that spawns a hook with run_command() and
interacts with it a "hook driver" in the rest of this message), but before
going in that direction, I would like to see us stepping back and bit and
think about the way hooks are called in the current code.

We seem to already have too many hook drivers, each of which hand-roll
similar logic using run-command API. At some point, we would want to have
a single "run_hook" helper function that takes:

 - the name of a hook;
 - the command line arguments to be fed;
 - a callback "generator" function to feed the standard input stream of
   the hook process;
 - a callback "consumer" function to receive the standard output stream of
   the hook process; and
 - set of environment tweaks while running the hook (e.g. run the hook
   while setting GIT_INDEX_FILE to a temporary file).

and hides away the complexity from hook drivers.  The command line
arguments, input and output callback functions are all optional depending
on the API between the hook driver and the hook (e.g. the "post-update"
hook takes arguments from its command line but does not interact with the
standard I/O stream, while the "post-receive" hook takes its input from
the standard input stream). Tweaking of the environment is also optional;
not many hooks need it.

By formalizing the hook driver API that way, any hook driver that drives a
tricky hook that may need a select(2) loop to avoid a deadlock in a way
similar to your patch do would not have to worry about the issue, as the
run_hook() helper would take care of it by reading from the hook's output
pipe and drain the pipe by calling the "consumer" callback before calling
the "generator" callback and feed more input to the hook to cause a
deadlock. We also could in the future do many other things if and when we
wanted to that the current code structure makes difficult. A few examples
that readily come to my mind are:

 - relocate where the hook scripts live, by limiting the hook driver API
   to take just the "name" of the hook. The current hook callsites know
   that the hooks live in git_path("hooks/$name") and call run_command()
   on their own, but the run_hook() helper could redirect it away to
   somewhere else, e.g. "/etc/git-core/hooks/$name".

 - run a set of hooks on the same triggering condition. You may want to
   have two "post-receive" hooks, one to feed an e-mail based notification
   system and another to drive an autobuilder, for example. For this to
   work, the "generator" and "consumer" callbacks need to have a way for
   us to tell "beginning of a new session" and "end of a new session", so
   that they can produce/consume the same set of values more than once.

 - perhaps ignore SIGPIPE if the hook chooses not to read any information
   the hook driver provides with it.

I have been wondering when would be the good time to refactor the hook
driver API.  We can add your patch, after polishing it enough to make it
ready for inclusion, independent of the hook API refactoring. But that
would mean that it would require more work when refactoring the hook API,
as we would have one more hand-rolled hook caller that is based on
run_command().
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html