Re: Funnies with "git fetch"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Jeff King <peff@xxxxxxxx> writes:
>
>> If I understand correctly, your series is just about checking that we
>> have newly-referenced blobs. We were already checking commits and trees,
>> and we should already be hashing individual objects when we index the
>> pack. Right?
>
> You may be slightly misunderstanding the series.
>
> We let unpack-objects or index-pack consume the pack stream, either by
> exploding them into loose objects, or computing the object name for each
> object to create the mapping from object name to the offset. During this
> process, we deflate to read the contents and resolve the delta to come up
> with the object name for individual objects, so we would notice corruption
> at the individual object level. As pack stream does not say what name each
> object is (the recipient is expected to compute it), there is no "stream
> says it is object X but the data is actually for object Y" problem. The
> recipient does not even see "X"---all it sees is Y.

One thing I forgot to mention.

If you have receive.fsckobjects set, unpack-objects and index-pack are
called with the --strict option and causes fsck_objects() to run, so we
will also catch malformed commits and trees that way. But even then,
calling fsck_object() on a blob object always succeeds, so it is not a
real check.

We probably would want to flip the default for receive_fsck_objects to
true one of these days.

> The current code does not try to make sure we really have the objects
> necessary to connect the updated tips to our original refs at all.  Not
> just blobs but neither commits nor trees are traversed. The new check in
> store_updated_refs() is about that. So in that sense, the series is not
> about "just blobs".
>
> The "rev-list --verify-objects" patch is about "blob vs everything else".
> It is used in the existing quickfetch() check, and also the additional
> check in store_updated_refs(). The existing check we run with "--objects"
> is capable of detecting corruptions of commits and trees (as we had to be
> able to read them to discover objects they refer to), but that is not a
> sufficient check if we worry about missing blobs.

I am debating myself if we want to also add calls to fsck_object() to the
new codepath. An additional patch on top of [2/3] would look like this
(totally untested).

It would make "rev-list --verify-objects" useful independently from its
use in the context of "git fetch", and more importantly, what it checks in
the context of "git fetch", while it is related, is more or less
orthogonal to what receive.fsckobjects checks. The check done during the
transfer is to check the set of objects the other side threw at us. The
check done in the check_everything_connected() by calling "rev-list
--verify-objects" is to check the set of objects we are actually going to
use. The former _should_ be superset of the latter, but [3/3] is about
making sure that the former indeed is the superset of the latter.

 builtin/rev-list.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ab3be7c..bd49615 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -7,6 +7,7 @@
 #include "log-tree.h"
 #include "graph.h"
 #include "bisect.h"
+#include "fsck.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] <commit-id>... [ -- paths... ]\n"
@@ -53,6 +54,11 @@ static void show_commit(struct commit *commit, void *data)
 	struct rev_info *revs = info->revs;
 
 	graph_show_commit(revs->graph);
+	if (revs->verify_objects) {
+		if (!commit->object.parsed)
+			parse_object(commit->object.sha1);
+		fsck_object(&commit->object, 1, fsck_error_function);
+	}
 
 	if (revs->count) {
 		if (commit->object.flags & PATCHSAME)
@@ -183,8 +189,11 @@ static void show_object(struct object *obj,
 	struct rev_info *info = cb_data;
 
 	finish_object(obj, path, component, cb_data);
-	if (info->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
-		parse_object(obj->sha1);
+	if (info->verify_objects && obj->type != OBJ_COMMIT) {
+		if (!obj->parsed)
+			parse_object(obj->sha1);
+		fsck_object(obj, 1, fsck_error_function);
+	}
 	show_object_with_name(stdout, obj, path, component);
 }
 


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]