Hi, Dmitry Ivankov wrote: > fast-import is able to write imported tree objects in delta format. > It holds a tree structure in memory where each tree entry may have > a delta base sha1 assigned. When delta base data is needed it is > reconstructed from this in-memory structure. Though sometimes the > delta base data doesn't match the delta base sha1 so wrong or even > corrupt pack is produced. I'm having trouble parsing this; not sure why. Some guesses: - dropping the word "imported" could help, since it is the content of trees that comes from the user, not the tree objects - it's not clear to me what the second sentence is saying. Do you mean that git looks at the versions[0].sha1 fields of item in t->entries to construct an in-memory tree object to delta against, instead of finding the object named by versions[0].sha1, inflating it, and using it directly? - the third sentence seems to be describing a problem, but I'm not sure what the relationship is to this patch: will this patch fix that problem, or does it just add a test illustrating it? > To create a small easily reproducible test, add an excessive check > for delta base sha1. It's not likely that computing sha1 for each > tree delta base costs us much. Since the first word in fast-import is "fast", I would be much happier with some measurements with a typical import (i.e., one that doesn't use --cat-blob-fd) than the statement "It's not likely". :) If this tweak will continue to be useful after the fix, perhaps it could be made optional. I haven't thought carefully about this, though. On to the patch. [...] > +++ b/fast-import.c > @@ -1455,12 +1455,22 @@ static void store_tree(struct tree_entry *root) > store_tree(t->entries[i]); > } > > + if (!is_null_sha1(root->versions[0].sha1) > + && S_ISDIR(root->versions[0].mode)) { > + unsigned char old_tree_sha1[20]; > + mktree(t, 0, &old_tree); > + prepare_object_hash(OBJ_TREE, &old_tree, > + NULL, NULL, old_tree_sha1); > + > + if (hashcmp(old_tree_sha1, root->versions[0].sha1)) > + die("internal tree delta base sha1 mismatch"); This is the heart of the patch; it involves several changes. 1. construct the base object tree whether the base object is in the current pack or not 2. calculate its hash and compare to ->versions[0].sha1 as a sanity check. For large trees, I fear it could be an important slowdown. > + > - le = find_object(root->versions[0].sha1); > - if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) { > - mktree(t, 0, &old_tree); > + le = find_object(root->versions[0].sha1); > + if (le && le->pack_id == pack_id) { > lo.data = old_tree; > lo.offset = le->idx.offset; > lo.depth = t->delta_depth; > } > + } [...] > --- a/t/t9300-fast-import.sh > +++ b/t/t9300-fast-import.sh > @@ -734,6 +734,44 @@ test_expect_success \ [...] > +cat >input2 <<INPUT_END > +commit refs/heads/L2 > +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE > +data <<COMMIT > +update L2 > +COMMIT > +from refs/heads/L2^0 > +M 040000 @A g > +M 040000 @E g/b > +M 040000 @E g/b/h > +INPUT_END > + > +test_expect_failure \ > + 'L: verify internal tree delta base' \ > + 'git fast-import <input && > + A=$(git ls-tree L2 a | tr " " "\t" | cut -f 3) && > + E=$(git ls-tree L2 a/e | tr " " "\t" | cut -f 3) && > + cat input2 | sed -e "s/@A/$A/" -e "s/@E/$E/" >input && > + git fast-import <input' The description ("L: verify internal tree delta base" here) should describe something we want to work --- a facility or a statement --- and should leave out words like "verify" unless it is a test of verification facilities. In this example, I guess it is testing something like "delta base is not corrupted when replacing one directory by another"? (That's a random, wild guess and not meant as an example to be used verbatim.) I suppose I would be happier if we can find a way to reproduce this without modifying the behavior in such an invasive way. Which should be easier while thinking about the fix, so I'll move on to that. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html