Joey Hess <joey@xxxxxxxxxxx> writes: > The post-fetch hook is fed lines on stdin for all refs that were fetched, and > outputs on stdout possibly modified lines. Its output is parsed and used > when git fetch updates the remote tracking refs, records the entries in > FETCH_HEAD, and produces its report. > > --- > > Not quite ready to sign off on this yet, but it does work. > Comments and code review appreciated. Thanks. I do have a few comments. This hook is no longer a "post" fetch hook. The mechanism lets the object transfer phase does its work and then rewrites/tweaks the result before fetch completes. To an outside observer, what the hook does is an integral part of what "fetch" does, and not something that happens _after_ fetch completes. I am bad at naming things, but something along the lines of "tweak-fetch" that makes it clear that what happens in the hook is still part of the fetching may be a more appropriate name, methinks. I very much on purpose said that the hook "must read everything from its standard input, *and* *then* return..." in my response. Your "Demo" hook emits output as it reads its input with sed, but your main process invokes the hook, drains everything with write_in_full() before starting to read a single line, so I suspect that your hook will deadlock when its output pipe buffer fills up without being read by the main process. Of course, for this deadlock to actually happen, you need to be fetching quite a lot of refs. To make the life of hook writers easier, however, it would be good to support a hook written in the style of your "Demo" hook, instead of requiring it to read everything before emiting any output. I think you could solve this by having a select(2) loop to avoid the deadlock on our end (lets call the code that spawns a hook with run_command() and interacts with it a "hook driver" in the rest of this message), but before going in that direction, I would like to see us stepping back and bit and think about the way hooks are called in the current code. We seem to already have too many hook drivers, each of which hand-roll similar logic using run-command API. At some point, we would want to have a single "run_hook" helper function that takes: - the name of a hook; - the command line arguments to be fed; - a callback "generator" function to feed the standard input stream of the hook process; - a callback "consumer" function to receive the standard output stream of the hook process; and - set of environment tweaks while running the hook (e.g. run the hook while setting GIT_INDEX_FILE to a temporary file). and hides away the complexity from hook drivers. The command line arguments, input and output callback functions are all optional depending on the API between the hook driver and the hook (e.g. the "post-update" hook takes arguments from its command line but does not interact with the standard I/O stream, while the "post-receive" hook takes its input from the standard input stream). Tweaking of the environment is also optional; not many hooks need it. By formalizing the hook driver API that way, any hook driver that drives a tricky hook that may need a select(2) loop to avoid a deadlock in a way similar to your patch do would not have to worry about the issue, as the run_hook() helper would take care of it by reading from the hook's output pipe and drain the pipe by calling the "consumer" callback before calling the "generator" callback and feed more input to the hook to cause a deadlock. We also could in the future do many other things if and when we wanted to that the current code structure makes difficult. A few examples that readily come to my mind are: - relocate where the hook scripts live, by limiting the hook driver API to take just the "name" of the hook. The current hook callsites know that the hooks live in git_path("hooks/$name") and call run_command() on their own, but the run_hook() helper could redirect it away to somewhere else, e.g. "/etc/git-core/hooks/$name". - run a set of hooks on the same triggering condition. You may want to have two "post-receive" hooks, one to feed an e-mail based notification system and another to drive an autobuilder, for example. For this to work, the "generator" and "consumer" callbacks need to have a way for us to tell "beginning of a new session" and "end of a new session", so that they can produce/consume the same set of values more than once. - perhaps ignore SIGPIPE if the hook chooses not to read any information the hook driver provides with it. I have been wondering when would be the good time to refactor the hook driver API. We can add your patch, after polishing it enough to make it ready for inclusion, independent of the hook API refactoring. But that would mean that it would require more work when refactoring the hook API, as we would have one more hand-rolled hook caller that is based on run_command(). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html