Re: [PATCH/WIP 6/7] fast-import: workaround data corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

Dmitry Ivankov wrote:

> fast-import keeps track of some delta-base for tree objects. When it is
> time to compute the delta, base object is constructed from in-memory
> tree representation using children's delta bases sha1. But these can be
> unrelated due to several bugs, and it leads to object with wrong sha1
> being delta-written to the packfile.

This seems like as good a starting point as any.  Small language
nitpicks:

 - "some delta-base" is somewhat vague

 - missing article (probably "the") before "base object" and
   "children's"

 - does "children's delta bases sha1" mean "the pre-computed object
   names and modes in versions[0].{sha1,mode} for each tree entry" or
   something else?

 - when you say "unrelated": unrelated to what?

> We have the base sha1 and what we think it's data is. Verify sha1 and if
> it doesn't match, report it to stderr and don't use delta for this tree.

At any rate, if I understand correctly, the idea is that when store_tree()
is being called, root->versions[0].sha1 does not match the tree object
implied by root->tree->entries[*]->{name,versions[0].{mode,sha1}}.  This
patch adds a quick check to notice when that happens.

Makes a lot of sense, especially as a demonstration of the problem.
store_object() returns early in many cases so this probably does make
the bug easier to find (good).

I wonder if it would make more sense to perform this sanity check in
store_object() so blobs could benefit, too.  How much does the check slow
fast-import down (if at all)?

> We could also die() here when bugs are fixed.

If the check is fast, sure, why not? :)  The only reason I can think
of is that computing a SHA-1 is an O(size of object) cost which it
would be nice to avoid if profiling shows it is noticeable.

> Or we can see if the data
> we've got is from our pack file and so still try to use it as a base.

Can you elaborate?  When would versions[0].sha1 not match the data
but the data still be in the pack file?

> --- a/fast-import.c
> +++ b/fast-import.c
[...]
> @@ -1486,10 +1487,21 @@ static void store_tree(struct tree_entry *root)
>  
>  	le = find_object(root->versions[0].sha1);
>  	if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
> +		unsigned char sh[20];

What does "sh" stand for?

>  		mktree(t, 0, &old_tree);
>  		lo.data = old_tree;
>  		lo.offset = le->idx.offset;
>  		lo.depth = t->delta_depth;
> +
> +		prepare_object_hash(OBJ_TREE, &old_tree, NULL, NULL, sh);
> +		if (hashcmp(sh, root->versions[0].sha1)) {
> +			fprintf(stderr, "internal sha1 delta base mismatch,"
> +					" won't use delta for that tree\n");
> +			lo.data = empty;

To avoid a memory leak and introducing an unnecessary variable:

			strbuf_reset(&lo.data);

Thanks, that was interesting.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]