Hi! Dmitry Ivankov wrote: > fast-import keeps track of some delta-base for tree objects. When it is > time to compute the delta, base object is constructed from in-memory > tree representation using children's delta bases sha1. But these can be > unrelated due to several bugs, and it leads to object with wrong sha1 > being delta-written to the packfile. This seems like as good a starting point as any. Small language nitpicks: - "some delta-base" is somewhat vague - missing article (probably "the") before "base object" and "children's" - does "children's delta bases sha1" mean "the pre-computed object names and modes in versions[0].{sha1,mode} for each tree entry" or something else? - when you say "unrelated": unrelated to what? > We have the base sha1 and what we think it's data is. Verify sha1 and if > it doesn't match, report it to stderr and don't use delta for this tree. At any rate, if I understand correctly, the idea is that when store_tree() is being called, root->versions[0].sha1 does not match the tree object implied by root->tree->entries[*]->{name,versions[0].{mode,sha1}}. This patch adds a quick check to notice when that happens. Makes a lot of sense, especially as a demonstration of the problem. store_object() returns early in many cases so this probably does make the bug easier to find (good). I wonder if it would make more sense to perform this sanity check in store_object() so blobs could benefit, too. How much does the check slow fast-import down (if at all)? > We could also die() here when bugs are fixed. If the check is fast, sure, why not? :) The only reason I can think of is that computing a SHA-1 is an O(size of object) cost which it would be nice to avoid if profiling shows it is noticeable. > Or we can see if the data > we've got is from our pack file and so still try to use it as a base. Can you elaborate? When would versions[0].sha1 not match the data but the data still be in the pack file? > --- a/fast-import.c > +++ b/fast-import.c [...] > @@ -1486,10 +1487,21 @@ static void store_tree(struct tree_entry *root) > > le = find_object(root->versions[0].sha1); > if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) { > + unsigned char sh[20]; What does "sh" stand for? > mktree(t, 0, &old_tree); > lo.data = old_tree; > lo.offset = le->idx.offset; > lo.depth = t->delta_depth; > + > + prepare_object_hash(OBJ_TREE, &old_tree, NULL, NULL, sh); > + if (hashcmp(sh, root->versions[0].sha1)) { > + fprintf(stderr, "internal sha1 delta base mismatch," > + " won't use delta for that tree\n"); > + lo.data = empty; To avoid a memory leak and introducing an unnecessary variable: strbuf_reset(&lo.data); Thanks, that was interesting. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html