Re: fast-import and unique objects.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> On 8/6/06, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> >So the new version should take about 20 MB of memory and should
> >produce a valid pack and index in the same time as it does only
> >the pack now.  Plus it won't generate duplicates.
> 
> I did a run with this and it works great.

Good.  :-) On my drive in to work this afternoon I realized
that making you specify the size of the object table is stupid,
I could easily allocate a thousand objects at a time rather than
preallocating the whole thing.  Oh well.  fast-import thus far
hasn't been meant as production code for inclusion in core GIT,
but maybe it will get cleaned up and submitted as such if your
conversion efforts go well and produce a better CVS importer.
 
> I'm staring at the cvs2svn code now trying to figure out how to modify
> it without rewriting everything. I may just leave it all alone and
> build a table with cvs_file:rev to sha-1 mappings. It would be much
> more efficient to carry sha-1 throughout the stages but that may
> require significant rework.

Does it matter?  How long does the cvs2svn processing take,
excluding the GIT blob processing that's now known to take 2 hours?
What's your target for an acceptable conversion time on the system
you are working on?


Any thoughts yet on how you might want to feed trees and commits
to a fast pack writer?  I was thinking about doing a stream into
fast-import such as:

	<4 byte length of commit><commit><treeent>*<null>

where <commit> is the raw commit minus the first "tree nnn\n" line, and
<treeent> is:

	<type><sp><sha1><sp><path><null>

where <type> is one of 'B' (normal blob), 'L' (symlink), 'X'
(executable blob), <sha1> is the 40 byte hex, <path> is the file from
the root of the repository ("src/module/foo.c"), and <sp> and <null>
are the obvious values.  You would feed all tree entries and the pack
writer would split the stream up into the individual tree objects.

fast-import would generate the tree(s) delta'ing them against the
prior tree of the same path, prefix "tree nnn\n" to the commit
blob you supplied, generate the commit, and print out its ID.
By working from the first commit up to the most recent each tree
deltas would be using the older tree as the base which may not be
ideal if a large number of items get added to a tree but should be
effective enough to generate a reasonably sized initial pack.

It would however mean you need to monitor the output pipe from
fast-import to get back the commit id so you can use it to prep
the next commit's parent(s) as you can't produce that in Python.

-- 
Shawn.
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]