Re: long fast-import errors out "failed to apply delta"

Dmitry Ivankov <divanorama@xxxxxxxxx> · Wed, 27 Jul 2011 00:22:26 +0600

Hi,

On Tue, Jul 26, 2011 at 10:58 PM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> Hi Dmitry,
>
> Dmitry Ivankov wrote:
>
>> 3) Is this breakage specific to tags/v1.7.3-rc0~75^2 "Teach
>> fast-import to import subtrees named by tree id" 30 Jun 2010 (allowing
>> M 040000 <tree id> pathname)?
>
> Probably. :)
>
>> The harder fix is to try to keep e->versions[0].sha1 for trees correct.
>
> Context for those who don't remember all the details:
>
> When first introduced, "struct tree_entry" was very simple: a mode, a
> filename, and:
>
>  - for regular objects and symlinks, a blob object name representing
>   its content;
>
>  - for subdirectories, a pointer to "struct tree_content" listing its
>   contents, along with an _optional_ cached tree object name.
>
> When modifying a tree entry, fast-import would walk through the path
> to it and invalidate the cached tree names at each step.  Shawn wrote:
>
>        Currently trees and commits aren't being deltafied when written to
>        the pack and branch reloading from the current pack doesn't work,
>        so at most 5 branches can be worked with at any one time.
>
> but the advantage was that it was very simple.  Later, delta
> compression arrived:
>
>        commit 4cabf858
>        Author: Shawn O. Pearce <spearce@xxxxxxxxxxx>
>        Date:   Mon Aug 28 12:22:50 2006 -0400
>
>            Implemented tree delta compression in fast-import.
>
>            We now store for every tree entry two modes and two sha1 values;
>            the base (aka "version 0") and the current/new (aka "version 1").
>            When we generate a tree object we also regenerate the prior version
>            object and use that as our base object for a delta.  This strategy
>            saves a significant amount of memory as we can continue to use the
>            atom pool for file/directory names and only increases each tree
>            entry by an additional 24 bytes of memory.
>
> In other words, this commit introduced a "prior" mode and tree or blob
> name to give the pack-writing machinery a hint about what to delta
> against.
>
> With that in mind, it seems very weird that the version 0 tree would
> ever be changed and need to have its object name invalidated.

Well, if we'd read the actual prior contents via sha1-lookup, no
invalidation would be needed,
we can create delta with any base after all.
But we use mktree(v=0) to collect versions[0].sha1 of entries with
(mode != 0) as a content.
I think I almost got it now. When we tree_content_set by sha1 (we use
subtree = NULL) and
that tree had some versions[0].sha1, we reset it's ->tree to NULL
loosing the old sha1's of
the entries. Later we traverse there and load_tree that NULL, setting
the entries with
versions[0].sha1 = versions[1].sha1 = sha1's of new/current tree entries.

So looks like load_tree should restore versions[0].sha1's IF the prior
object was a tree.
I've tried this approach with merging two trees, but my rough patch is
bloated and crashes.

A bit slower one is to do sha1 lookups for the delta base trees.

>  Perhaps
> we are releasing some memory too early and it is getting clobbered?
> Unless I am missing something, the patch
>
>> --- a/fast-import.c
>> +++ b/fast-import.c
>> @@ -1514,6 +1514,9 @@ static int tree_content_set(
>>                                 if (e->tree)
>>                                         release_tree_content_recursive(e->tree);
>>                                 e->tree = subtree;
>> +                               if (S_ISDIR(mode)) {
>> +                                       hashclr(e->versions[0].sha1);
>> +                               }
>>                                 hashclr(root->versions[1].sha1);
>>                                 return 1;
>
> just disables deltas for trees that have been modified by a "M 040000"
> command, so it feels more like a workaround than a fundamental fix.
Right, I can't even say I understand in full what does it change.

> Could you save the svn-fe output (e.g., by introducing "tee" in the
> middle of the "svn-fe | fast-import" pipeline) and put it up
> somewhere online?  This would also be a good starting point for coming
> up with a reduced testcase.
It's 1.9G uncompressed, 0.7G lzo-compressed. Will setup a ftp or
torrent seed a bit later.

>
> Hope that helps,
> Jonathan
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html