Hi, On Tue, Jul 26, 2011 at 10:58 PM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: > Hi Dmitry, > > Dmitry Ivankov wrote: > >> 3) Is this breakage specific to tags/v1.7.3-rc0~75^2 "Teach >> fast-import to import subtrees named by tree id" 30 Jun 2010 (allowing >> M 040000 <tree id> pathname)? > > Probably. :) > >> The harder fix is to try to keep e->versions[0].sha1 for trees correct. > > Context for those who don't remember all the details: > > When first introduced, "struct tree_entry" was very simple: a mode, a > filename, and: > > - for regular objects and symlinks, a blob object name representing > its content; > > - for subdirectories, a pointer to "struct tree_content" listing its > contents, along with an _optional_ cached tree object name. > > When modifying a tree entry, fast-import would walk through the path > to it and invalidate the cached tree names at each step. Shawn wrote: > > Currently trees and commits aren't being deltafied when written to > the pack and branch reloading from the current pack doesn't work, > so at most 5 branches can be worked with at any one time. > > but the advantage was that it was very simple. Later, delta > compression arrived: > > commit 4cabf858 > Author: Shawn O. Pearce <spearce@xxxxxxxxxxx> > Date: Mon Aug 28 12:22:50 2006 -0400 > > Implemented tree delta compression in fast-import. > > We now store for every tree entry two modes and two sha1 values; > the base (aka "version 0") and the current/new (aka "version 1"). > When we generate a tree object we also regenerate the prior version > object and use that as our base object for a delta. This strategy > saves a significant amount of memory as we can continue to use the > atom pool for file/directory names and only increases each tree > entry by an additional 24 bytes of memory. > > In other words, this commit introduced a "prior" mode and tree or blob > name to give the pack-writing machinery a hint about what to delta > against. > > With that in mind, it seems very weird that the version 0 tree would > ever be changed and need to have its object name invalidated. Well, if we'd read the actual prior contents via sha1-lookup, no invalidation would be needed, we can create delta with any base after all. But we use mktree(v=0) to collect versions[0].sha1 of entries with (mode != 0) as a content. I think I almost got it now. When we tree_content_set by sha1 (we use subtree = NULL) and that tree had some versions[0].sha1, we reset it's ->tree to NULL loosing the old sha1's of the entries. Later we traverse there and load_tree that NULL, setting the entries with versions[0].sha1 = versions[1].sha1 = sha1's of new/current tree entries. So looks like load_tree should restore versions[0].sha1's IF the prior object was a tree. I've tried this approach with merging two trees, but my rough patch is bloated and crashes. A bit slower one is to do sha1 lookups for the delta base trees. > Perhaps > we are releasing some memory too early and it is getting clobbered? > Unless I am missing something, the patch > >> --- a/fast-import.c >> +++ b/fast-import.c >> @@ -1514,6 +1514,9 @@ static int tree_content_set( >> if (e->tree) >> release_tree_content_recursive(e->tree); >> e->tree = subtree; >> + if (S_ISDIR(mode)) { >> + hashclr(e->versions[0].sha1); >> + } >> hashclr(root->versions[1].sha1); >> return 1; > > just disables deltas for trees that have been modified by a "M 040000" > command, so it feels more like a workaround than a fundamental fix. Right, I can't even say I understand in full what does it change. > Could you save the svn-fe output (e.g., by introducing "tee" in the > middle of the "svn-fe | fast-import" pipeline) and put it up > somewhere online? This would also be a good starting point for coming > up with a reduced testcase. It's 1.9G uncompressed, 0.7G lzo-compressed. Will setup a ftp or torrent seed a bit later. > > Hope that helps, > Jonathan > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html