Re: [PATCH 2/3] fast-import: add a check for tree delta base sha1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Dmitry Ivankov wrote:

> fast-import is able to write imported tree objects in delta format.
> It holds a tree structure in memory where each tree entry may have
> a delta base sha1 assigned. When delta base data is needed it is
> reconstructed from this in-memory structure. Though sometimes the
> delta base data doesn't match the delta base sha1 so wrong or even
> corrupt pack is produced.

I'm having trouble parsing this; not sure why.  Some guesses:

 - dropping the word "imported" could help, since it is the
   content of trees that comes from the user, not the tree objects

 - it's not clear to me what the second sentence is saying.  Do you
   mean that git looks at the versions[0].sha1 fields of item in
   t->entries to construct an in-memory tree object to delta against,
   instead of finding the object named by versions[0].sha1, inflating
   it, and using it directly?

 - the third sentence seems to be describing a problem, but I'm not
   sure what the relationship is to this patch: will this patch fix
   that problem, or does it just add a test illustrating it?

> To create a small easily reproducible test, add an excessive check
> for delta base sha1. It's not likely that computing sha1 for each
> tree delta base costs us much.

Since the first word in fast-import is "fast", I would be much
happier with some measurements with a typical import (i.e., one that
doesn't use --cat-blob-fd) than the statement "It's not likely". :)

If this tweak will continue to be useful after the fix, perhaps it
could be made optional.  I haven't thought carefully about this,
though.

On to the patch.

[...]
> +++ b/fast-import.c
> @@ -1455,12 +1455,22 @@ static void store_tree(struct tree_entry *root)
>  			store_tree(t->entries[i]);
>  	}
>  
> +	if (!is_null_sha1(root->versions[0].sha1)
> +					&& S_ISDIR(root->versions[0].mode)) {
> +		unsigned char old_tree_sha1[20];
> +		mktree(t, 0, &old_tree);
> +		prepare_object_hash(OBJ_TREE, &old_tree,
> +						NULL, NULL, old_tree_sha1);
> +
> +		if (hashcmp(old_tree_sha1, root->versions[0].sha1))
> +			die("internal tree delta base sha1 mismatch");

This is the heart of the patch; it involves several changes.

 1. construct the base object tree whether the base object is in the
    current pack or not

 2. calculate its hash and compare to ->versions[0].sha1 as a sanity
    check.

For large trees, I fear it could be an important slowdown.

> +
> -		le = find_object(root->versions[0].sha1);
> -		if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
> -			mktree(t, 0, &old_tree);
> +		le = find_object(root->versions[0].sha1);
> +		if (le && le->pack_id == pack_id) {
>  			lo.data = old_tree;
>  			lo.offset = le->idx.offset;
>  			lo.depth = t->delta_depth;
>  		}
> +	}
[...]
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -734,6 +734,44 @@ test_expect_success \
[...]
> +cat >input2 <<INPUT_END
> +commit refs/heads/L2
> +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> +data <<COMMIT
> +update L2
> +COMMIT
> +from refs/heads/L2^0
> +M 040000 @A g
> +M 040000 @E g/b
> +M 040000 @E g/b/h
> +INPUT_END
> +
> +test_expect_failure \
> +    'L: verify internal tree delta base' \
> +	'git fast-import <input &&
> +	A=$(git ls-tree L2 a | tr " " "\t" | cut -f 3) &&
> +	E=$(git ls-tree L2 a/e | tr " " "\t" | cut -f 3) &&
> +	cat input2 | sed -e "s/@A/$A/" -e "s/@E/$E/" >input &&
> +	git fast-import <input'

The description ("L: verify internal tree delta base" here) should
describe something we want to work --- a facility or a statement ---
and should leave out words like "verify" unless it is a test of
verification facilities.

In this example, I guess it is testing something like "delta base is
not corrupted when replacing one directory by another"?  (That's a
random, wild guess and not meant as an example to be used verbatim.)

I suppose I would be happier if we can find a way to reproduce this
without modifying the behavior in such an invasive way.  Which should
be easier while thinking about the fix, so I'll move on to that.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]