Re: [PATCH] {fetch,receive}-pack: skip sha-1 integrity test on objects from new pack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> After thinking about this a bit more, I am beginning to think that the
> validation of object contents is unnecessary in _all_ cases that involve
> "git fetch". ...
> So the right solution is probably use --objects for connectivity checks as
> before; we could add a fetch.paranoid configuration to allow people to
> still use it (with this patch to remove the over-pessimism from the code)
> but only if they want to.

Following the above train of thought, the patch to revert what 6d4bb38
(fetch: verify we have everything we need before updating our ref,
2011-09-01) did which caused this performance regression should look like
this.  "tag --contains 6d4bb38" tells me that this affects everything
since v1.7.8-rc0 so we may want to issue maintenance updates once this
proves good in 1.7.10.

Nguyễn, sorry for being dense.  I think this ended up being very close to
your initial patch.

-- >8 --
Subject: fetch/receive: remove over-pessimistic connectivity check

Git 1.7.8 introduced an object and history re-validation step after
"fetch" or "push" causes new history to be added to a receiving
repository, in order to protect a malicious server or pushing client to
corrupt it by taking advantage of an existing corrupt object that is
unconnected to existing history of the receiving repository.

But this is way over-pessimistic, and often adds 5x to 8x runtime overhead
for a little gain.  During "fetch" or "receive-pack" (the server side of
"push"), unpack-objects and index-pack already validate individual objects
that are received, and the only thing we would want to catch are objects
that already happened to exist in our repository but were not referenced
from our refs.  But such objects must have been written by an earlier run
of our codepaths that write out loose objects or packfiles, and they must
have done the validation of individual objects when they did so.  The only
thing left to worry about is the connectivity integrity, which can be done
with much cheaper "rev-list --objects".

Remove this over-pessimistic check, while leaving the door open for later
changes, hinting the possibility in an in-code comment.

Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---
 builtin/fetch.c |    2 +-
 connected.c     |   14 +++++++++-----
 connected.h     |    3 ++-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 8761a33..b2f7d5b 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -505,7 +505,7 @@ static int quickfetch(struct ref *ref_map)
 	 */
 	if (depth)
 		return -1;
-	return check_everything_connected(iterate_ref_map, 1, &rm);
+	return check_everything_connected(iterate_ref_map, CONN_CHECK_QUIET, &rm);
 }
 
 static int fetch_refs(struct transport *transport, struct ref *ref_map)
diff --git a/connected.c b/connected.c
index d762423..3564a9f 100644
--- a/connected.c
+++ b/connected.c
@@ -6,22 +6,26 @@
 /*
  * If we feed all the commits we want to verify to this command
  *
- *  $ git rev-list --verify-objects --stdin --not --all
+ *  $ git rev-list --objects --stdin --not --all
  *
  * and if it does not error out, that means everything reachable from
- * these commits locally exists and is connected to some of our
- * existing refs.
+ * these commits locally exists and is connected to our existing refs.
  *
  * Returns 0 if everything is connected, non-zero otherwise.
+ *
+ * We may want to introduce CONN_CHECK_PARANOIA option the caller can
+ * pass to use --verify-objects instead of --objects to optionally have
+ * individual objects re-validated.
  */
-int check_everything_connected(sha1_iterate_fn fn, int quiet, void *cb_data)
+int check_everything_connected(sha1_iterate_fn fn, unsigned flags, void *cb_data)
 {
 	struct child_process rev_list;
-	const char *argv[] = {"rev-list", "--verify-objects",
+	const char *argv[] = {"rev-list", "--objects",
 			      "--stdin", "--not", "--all", NULL, NULL};
 	char commit[41];
 	unsigned char sha1[20];
 	int err = 0;
+	int quiet = !!(flags & CONN_CHECK_QUIET);
 
 	if (fn(cb_data, sha1))
 		return err;
diff --git a/connected.h b/connected.h
index 7e4585a..6f17e07 100644
--- a/connected.h
+++ b/connected.h
@@ -15,6 +15,7 @@ typedef int (*sha1_iterate_fn)(void *, unsigned char [20]);
  *
  * Return 0 if Ok, non zero otherwise (i.e. some missing objects)
  */
-extern int check_everything_connected(sha1_iterate_fn, int quiet, void *cb_data);
+extern int check_everything_connected(sha1_iterate_fn, unsigned flags, void *cb_data);
+#define CONN_CHECK_QUIET 01
 
 #endif /* CONNECTED_H */
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]