Mike Hommey <mh@xxxxxxxxxxxx> writes: > On Mon, Feb 01, 2016 at 07:10:04PM -0800, Junio C Hamano wrote: >> Mike Hommey <mh@xxxxxxxxxxxx> writes: >> >> > Shouldn't parse_object_buffer also do check_sha1_signature? >> >> In general, it shouldn't; its callers are supposed to do it as >> additional check when/if needed. Callers like the one in fsck.c >> does not want to die after seeing one bad one. We want to report >> and keep checking other things. > > Shouldn't some things like, at least, `git checkout`, still check > the sha1s, though? That is a different issue--my answer was about the quoted question regarding parse_object_buffer(). Its callers are supposed to do additional check when/if needed, and there may be codepaths that currently use parse_object_buffer() that may want to do their own check, or call a different function that does the check for them. Having said that, I do not necessarily think "git checkout" should revalidate the object name. The repository that you use for your daily work would have the same error/corruption rate as your working tree files, and I do not think you would constantly "validate" what is in your working tree by comparing their contents with what you think ought to be there. If you are working on extremely poor quality disks and SSDs, it might make sense to constantly revalidating the object data to catch corruption early, as that is what we can do (as opposed to the working tree files, corruption to which you probably do not have anything to catch bitflipping on). Unless you are placing your working tree on a filesystem with checksums, but your object data would also be protected against corruption in that case, so an extra revalidation at "git checkout" time would not buy you much if anything at all. http://article.gmane.org/gmane.comp.version-control.git/283380 (not necessarily the entire thread, but that exact article) is a reasonable summary that illustrates the way how we view the object integrity. So "index-pack" is the enforcement point, and the rest of the git commands generally assume that we can trust what is on disk (as it is has either been generated by us, or checked by index-pack). The rest of the commands do not spend time checking that the on-disk contents are sane (though you can run git-fsck if you want to do that). If anything, we may want to reduce the number of codepaths that calls check_sha1_signature() on data that we know we have read from our own repository. Even though I am not opposed to an idea to have a "paranoid" mode that revalidates the object name every time (and if "git checkout" does not currently, allow it to revalidate when we are operating under the "paranoid" mode), I do not think it should be on by default. In fact, I have this suspicion that the original justification to have the call to check_sha1_signature() in parse_object() might have been invalidated by the restructuring of the code over the past 10 years. e9eefa67 ([PATCH] Add function to parse an object of unspecified type (take 2), 2005-04-28) says It also checks the hash of the file, due to its heritage as part of fsck-cache. I.e. we do not need this call here, as long as we make sure that fsck codepath does not depend on the fact that parse_object() calls check_sha1_signature() to validate the consistency between the data and the object name. In fact, we do this, which is quite suboptimal: static int fsck_sha1(const unsigned char *sha1) { struct object *obj = parse_object(sha1); if (!obj) { errors_found |= ERROR_OBJECT; return error("%s: object corrupt or missing", sha1_to_hex(sha1)); } obj->flags |= HAS_OBJ; return fsck_obj(obj); } This function is called for each loose object file we find in fsck_object_dir(), and there are a few problems: * The function parse_object() called from here would issue an error message and returns NULL; then you get another "corrupt or missing" error message, because this code cannot tell from the NULL which is the case. * The intent of the callchain to fsck_sha1() is to iterate over loose object files xx/x{38} and validate what is contained in them, but that behaviour is not guaranteed because it calls parse_object(), which may get the object data from a packfile if the loose object is also in the packfile. This function should instead take "path" (the only caller of this function fsck_loose() has it), read the data in the file, hash the data to validate that it matches "sha1" and then create the object out of that data it read by calling parse_object_buffer(). I didn't check other callers of parse_object(), but I doubt that they need or want a check_sha1_signature() call in the function. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html