Re: [PATCH] add post-fetch hook

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano wrote:
> Thanks. I do have a few comments.
> 
> This hook is no longer a "post" fetch hook. The mechanism lets the object
> transfer phase does its work and then rewrites/tweaks the result before
> fetch completes. To an outside observer, what the hook does is an integral
> part of what "fetch" does, and not something that happens _after_ fetch
> completes. I am bad at naming things, but something along the lines of
> "tweak-fetch" that makes it clear that what happens in the hook is still
> part of the fetching may be a more appropriate name, methinks.

I'd be happy to call it tweak-fetch.

> I very much on purpose said that the hook "must read everything from its
> standard input, *and* *then* return..." in my response. Your "Demo" hook
> emits output as it reads its input with sed, but your main process invokes
> the hook, drains everything with write_in_full() before starting to read a
> single line, so I suspect that your hook will deadlock when its output
> pipe buffer fills up without being read by the main process. Of course,
> for this deadlock to actually happen, you need to be fetching quite a lot
> of refs.

Doesn't having the hook be fed by the async process/thread avoid this
problem? That's exactly why I did that, being familiar with the problem
from stupid perl scripts like this one:

#!/usr/bin/perl
use FileHandle;
use IPC::Open2;
$pid = open2(*Reader, *Writer, "cat");
print Writer "stuff $_\n" for 1..100000; # blocks forever
close Writer;
print "got: $_" while <Reader>;

I've tested feeding large quantities of data through my demo hook
(just feed it all the lines repeated 100 thousand times) and have
not seen it block. And other code in git uses an async feeder similarly,
see for example convert.c's apply_filter(). So I think this is ok..?

> We seem to already have too many hook drivers, each of which hand-roll
> similar logic using run-command API. At some point, we would want to have
> a single "run_hook" helper function that takes:

This was also my feeling as I looked around the code.

> By formalizing the hook driver API that way, any hook driver that drives a
> tricky hook that may need a select(2) loop to avoid a deadlock in a way
> similar to your patch do would not have to worry about the issue, as the
> run_hook() helper would take care of it by reading from the hook's output
> pipe and drain the pipe by calling the "consumer" callback before calling
> the "generator" callback and feed more input to the hook to cause a
> deadlock.

Actually, run_hook would need to either have a select loop, or an async
feeder like I've used. The method you describe is itself prone to deadlock!
Consider a hook that *does* consume all its input before outputting anything.
The first call to the "consumer" callback would block.

>  - run a set of hooks on the same triggering condition. You may want to
>    have two "post-receive" hooks, one to feed an e-mail based notification
>    system and another to drive an autobuilder

Something I've always wanted, in fact..

> I have been wondering when would be the good time to refactor the hook
> driver API.  We can add your patch, after polishing it enough to make it
> ready for inclusion, independent of the hook API refactoring. But that
> would mean that it would require more work when refactoring the hook API,
> as we would have one more hand-rolled hook caller that is based on
> run_command().

On the other hand, I have uses for this hook that are blocked on it
getting into git, so would rather not see it hung up in a general
refactoring. If it helps, I'd be happy to help with a hook refactoring
later.

The latest version of my patch (attached) already lifts reading from the
hook out into a helper function, so should need not many changes in such
a refactoring -- most of my run_tweak_fetch_hook would simply go away
then. I've also cleaned it up a lot and and pretty happy with it now.

-- 
see shy jo
From a06b5ef9908f56692a07fb37f5794c4122f10491 Mon Sep 17 00:00:00 2001
From: Joey Hess <joey@xxxxxxxxxxx>
Date: Mon, 26 Dec 2011 10:44:55 -0400
Subject: [PATCH 1/2] preparations for tweak-fetch hook

No behavior changes yet, only some groundwork for the next
change.

The refs_result structure combines a status code with a ref map,
which can be NULL even on success. This will be needed when
there's a tweak-fetch hook, because it can filter out all refs,
while still succeeding.

fetch_refs returns a refs_result, so that it can modify the ref_map.
---
 builtin/fetch.c |   54 +++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 33ad3aa..70b9f89 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -29,6 +29,11 @@ enum {
 	TAGS_SET = 2
 };
 
+struct refs_result {
+	struct ref *new_refs;
+	int status;
+};
+
 static int all, append, dry_run, force, keep, multiple, prune, update_head_ok, verbosity;
 static int progress, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT;
@@ -89,6 +94,15 @@ static struct option builtin_fetch_options[] = {
 	OPT_END()
 };
 
+static int add_existing(const char *refname, const unsigned char *sha1,
+			int flag, void *cbdata)
+{
+	struct string_list *list = (struct string_list *)cbdata;
+	struct string_list_item *item = string_list_insert(list, refname);
+	item->util = (void *)sha1;
+	return 0;
+}
+
 static void unlock_pack(void)
 {
 	if (transport)
@@ -507,17 +521,24 @@ static int quickfetch(struct ref *ref_map)
 	return check_everything_connected(iterate_ref_map, 1, &rm);
 }
 
-static int fetch_refs(struct transport *transport, struct ref *ref_map)
+static struct refs_result fetch_refs(struct transport *transport,
+		struct ref *ref_map)
 {
-	int ret = quickfetch(ref_map);
-	if (ret)
-		ret = transport_fetch_refs(transport, ref_map);
-	if (!ret)
-		ret |= store_updated_refs(transport->url,
+	struct refs_result res;
+	res.status = quickfetch(ref_map);
+	if (res.status)
+		res.status = transport_fetch_refs(transport, ref_map);
+	if (!res.status) {
+		res.new_refs = ref_map
+		res.status |= store_updated_refs(transport->url,
 				transport->remote->name,
-				ref_map);
+				res.new_refs);
+	}
+	else {
+		res.new_refs = ref_map;
+	}
 	transport_unlock_pack(transport);
-	return ret;
+	return res;
 }
 
 static int prune_refs(struct refspec *refs, int ref_count, struct ref *ref_map)
@@ -542,15 +563,6 @@ static int prune_refs(struct refspec *refs, int ref_count, struct ref *ref_map)
 	return result;
 }
 
-static int add_existing(const char *refname, const unsigned char *sha1,
-			int flag, void *cbdata)
-{
-	struct string_list *list = (struct string_list *)cbdata;
-	struct string_list_item *item = string_list_insert(list, refname);
-	item->util = (void *)sha1;
-	return 0;
-}
-
 static int will_fetch(struct ref **head, const unsigned char *sha1)
 {
 	struct ref *rm = *head;
@@ -673,6 +685,7 @@ static int do_fetch(struct transport *transport,
 	struct string_list_item *peer_item = NULL;
 	struct ref *ref_map;
 	struct ref *rm;
+	struct refs_result res;
 	int autotags = (transport->remote->fetch_tags == 1);
 
 	for_each_ref(add_existing, &existing_refs);
@@ -710,7 +723,9 @@ static int do_fetch(struct transport *transport,
 
 	if (tags == TAGS_DEFAULT && autotags)
 		transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
-	if (fetch_refs(transport, ref_map)) {
+	res = fetch_refs(transport, ref_map);
+	ref_map = res.new_refs;
+	if (res.status) {
 		free_refs(ref_map);
 		return 1;
 	}
@@ -750,7 +765,8 @@ static int do_fetch(struct transport *transport,
 		if (ref_map) {
 			transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
 			transport_set_option(transport, TRANS_OPT_DEPTH, "0");
-			fetch_refs(transport, ref_map);
+			res = fetch_refs(transport, ref_map);
+			ref_map = res.new_refs;
 		}
 		free_refs(ref_map);
 	}
-- 
1.7.7.3

From 073b0921bb5988628e7af423924c410f522f403a Mon Sep 17 00:00:00 2001
From: Joey Hess <joey@xxxxxxxxxxx>
Date: Mon, 26 Dec 2011 10:53:27 -0400
Subject: [PATCH 2/2] add tweak-fetch hook

The tweak-fetch hook is fed lines on stdin for all refs that were fetched,
and outputs on stdout possibly modified lines. Its output is parsed and
used when git fetch updates the remote tracking refs, records the entries
in FETCH_HEAD, and produces its report.
---
 Documentation/githooks.txt |   29 +++++++
 builtin/fetch.c            |  191 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 219 insertions(+), 1 deletions(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 28edefa..be2624c 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -162,6 +162,35 @@ This hook can be used to perform repository validity checks, auto-display
 differences from the previous HEAD if different, or set working dir metadata
 properties.
 
+tweak-fetch
+~~~~~~~~~~
+
+This hook is invoked by 'git fetch' (commonly called by 'git pull'), after
+refs have been fetched from the remote repository. It is not executed, if
+nothing was fetched.
+
+The output of the hook is used to update the remote-tracking branches, and
+`.git/FETCH_HEAD`, in preparation for for a later merge operation done by
+'git merge'.
+
+It takes no arguments, but is fed a line of the following format on
+its standard input for each ref that was fetched.
+
+  <sha1> SP not-for-merge|merge SP <remote-refname> SP <local-refname> LF
+
+Where the "not-for-merge" flag indicates the ref is not to be merged into the
+current branch, and the "merge" flag indicates that 'git merge' should
+later merge it. The `<remote-refname>` is the remote's name for the ref
+that was pulled, and `<local-refname>` is a name of a remote-tracking branch,
+like "refs/remotes/origin/master", or can be empty if the fetched ref is not
+being stored in a local refname.
+
+The hook must consume all of its standard input, and output back lines
+of the same format. It can modify its input as desired, including
+adding or removing lines, updating the sha1 (i.e. re-point the
+remote-tracking branch), changing the merge flag, and changing the
+`<local-refname>` (i.e. use different remote-tracking branch).
+
 post-merge
 ~~~~~~~~~~
 
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 70b9f89..5434b6f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -103,6 +103,194 @@ static int add_existing(const char *refname, const unsigned char *sha1,
 	return 0;
 }
 
+static const char tweak_fetch_hook[] = "tweak-fetch";
+
+int feed_tweak_fetch_hook (int in, int out, void *data)
+{
+	struct ref *ref;
+	struct strbuf buf = STRBUF_INIT;
+	int ret;
+
+	for (ref = data; ref; ref = ref->next) {
+		strbuf_addstr(&buf, sha1_to_hex(ref->old_sha1));
+		strbuf_addch(&buf, ' ');
+		strbuf_addstr(&buf, ref->merge ? "merge" : "not-for-merge");
+		strbuf_addch(&buf, ' ');
+		if (ref->name)
+			strbuf_addstr(&buf, ref->name);
+		strbuf_addch(&buf, ' ');
+		if (ref->peer_ref && ref->peer_ref->name)
+			strbuf_addstr(&buf, ref->peer_ref->name);
+		strbuf_addch(&buf, '\n');
+	}
+
+	ret = write_in_full(out, buf.buf, buf.len) != buf.len;
+	if (ret)
+		warning("%s hook failed to consume all its input",
+				tweak_fetch_hook);
+	close(out);
+	strbuf_release(&buf);
+	return ret;
+}
+	
+struct ref *parse_tweak_fetch_hook_line (char *l, 
+		struct string_list *existing_refs)
+{
+	struct ref *ref = NULL, *peer_ref = NULL;
+	struct string_list_item *peer_item = NULL;
+	char *words[4];
+	int i, word=0;
+	char *problem;
+
+	for (i=0; l[i]; i++) {
+		if (isspace(l[i])) {
+			l[i]='\0';
+			words[word]=l;
+			l+=i+1;
+			i=0;
+			word++;
+			if (word > 3) {
+				problem="too many words";
+				goto unparsable;
+			}
+		}
+	}
+	if (word < 3) {
+		problem="not enough words";
+		goto unparsable;
+	}
+	
+	ref = alloc_ref(words[2]);
+	peer_ref = ref->peer_ref = alloc_ref(l);
+	ref->peer_ref->force=1;
+
+	if (get_sha1_hex(words[0], ref->old_sha1)) {
+		problem="bad sha1";
+		goto unparsable;
+	}
+
+	if (strcmp(words[1], "merge") == 0) {
+		ref->merge=1;
+	}
+	else if (strcmp(words[1], "not-for-merge") != 0) {
+		problem="bad merge flag";
+		goto unparsable;
+	}
+
+	peer_item = string_list_lookup(existing_refs, peer_ref->name);
+	if (peer_item)
+		hashcpy(peer_ref->old_sha1, peer_item->util);
+
+	return ref;
+
+ unparsable:
+	warning("%s hook output a wrongly formed line: %s",
+			tweak_fetch_hook, problem);
+	free(ref);
+	free(peer_ref);
+	return NULL;
+}
+
+struct refs_result read_tweak_fetch_hook (int in) {
+	struct refs_result res;
+	FILE *f;
+	struct strbuf buf;
+	struct string_list existing_refs = STRING_LIST_INIT_NODUP;
+	struct ref *ref, *prevref=NULL;
+
+	res.status = 0;
+	res.new_refs = NULL;
+
+	f = fdopen(in, "r");
+	if (f == NULL) {
+		res.status = 1;
+		return res;
+	}
+
+	strbuf_init(&buf, 128);
+	for_each_ref(add_existing, &existing_refs);
+
+	while (strbuf_getline(&buf, f, '\n') != EOF) {
+		char *l = strbuf_detach(&buf, NULL);
+		ref = parse_tweak_fetch_hook_line(l, &existing_refs);
+		if (ref) {
+			if (prevref) {
+				prevref->next=ref;
+				prevref=ref;
+			}
+			else {
+				res.new_refs = prevref = ref;
+			}
+		}
+		else {
+			res.status = 1;
+		}
+		free(l);
+	}
+
+	string_list_clear(&existing_refs, 0);
+	strbuf_release(&buf);
+	fclose(f);
+	return res;
+}
+
+/* The hook is fed lines of the form:
+ * <sha1> SP <not-for-merge|merge> SP <remote-refname> SP <local-refname> LF
+ * And should output rewritten lines of the same form.
+ */
+struct ref *run_tweak_fetch_hook (struct ref *fetched_refs)
+{
+	struct child_process hook;
+	const char *argv[2];
+	struct async async;
+	struct refs_result res;
+
+	if (! fetched_refs)
+		return fetched_refs;
+
+	argv[0] = git_path("hooks/%s", tweak_fetch_hook);
+	if (access(argv[0], X_OK) < 0)
+		return fetched_refs;
+	argv[1] = NULL;
+
+	memset(&hook, 0, sizeof(hook));
+	hook.argv = argv;
+	hook.in = -1;
+	hook.out = -1;
+	if (start_command(&hook))
+		return fetched_refs;
+
+	/* Use an async writer to feed the hook process.
+	 * This allows the hook to read and write a line at
+	 * a time without blocking. */
+	memset(&async, 0, sizeof(async));
+	async.proc = feed_tweak_fetch_hook;
+	async.data = fetched_refs;
+	async.out = hook.in;
+	if (start_async(&async)) {
+		close(hook.in);
+		close(hook.out);
+		finish_command(&hook);
+		return fetched_refs;
+	}
+	res = read_tweak_fetch_hook(hook.out);
+	res.status |= finish_async(&async);
+	res.status |= finish_command(&hook);
+
+	if (res.status) {
+		warning("%s hook failed, ignoring its output", tweak_fetch_hook);
+		free(res.new_refs);
+		return fetched_refs;
+	}
+	else {
+		/* The new_refs are returned, to be used in place of
+		 * fetched_refs, so it is not needed anymore and can
+		 * be freed here. */
+		free_refs(fetched_refs);
+		return res.new_refs;
+	}
+}
+
 static void unlock_pack(void)
 {
 	if (transport)
@@ -529,7 +717,8 @@ static struct refs_result fetch_refs(struct transport *transport,
 	if (res.status)
 		res.status = transport_fetch_refs(transport, ref_map);
 	if (!res.status) {
-		res.new_refs = ref_map
+		res.new_refs = run_tweak_fetch_hook(ref_map);
+
 		res.status |= store_updated_refs(transport->url,
 				transport->remote->name,
 				res.new_refs);
-- 
1.7.7.3

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]