Re: Diff format in packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> On 7/31/06, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> >Jon Smirl wrote:
> >
> >> I'm trying to build a small app that takes a CVS ,v and writes out a
> >> pack corresponding to the versions. Suggestions on the most efficient
> >> strategy for doing this by calling straight into the git C code?
> >> Forking off git commands is not very efficient when done a million
> >> times.
> >
> >Something akin to parsecvs by Keith Packard?
> 
> I see the error in my thoughts now, I need the fully expanded delta to
> compute the sha-1 so I might as well use the parsecvs code.
> 
> I am working on combining cvs2svn, parsecvs and cvsps into something
> that can handle Mozilla CVS.

I think you sort of have the right idea.  Creating a pack file
from scratch without deltas is a very trivial operation.  The pack
format is documented in Documentation/technical/pack-format.txt.
The actual delta format isn't documented here and generating a delta
would be somewhat difficult, but creating a pack with no deltas
and only zlib compression is pretty simple.  And no, GIT doesn't
use the same (horrible) delta format as RCS so you definately are
right, you have to expand it before you can compress it.

Creating trees and commits from scratch is also really easy.  Calling
zlib and a SHA1 routine to create the checksum is the hard part.
I think I wrote the tree and commit construction part of jgit in
a few hours, and that was while I was also being distracted by
someone speaking in the front of the room.  :-)


It should be reasonably simple to extract each revision from a
single ,v file into its full undeltafied form, compute its SHA1,
compress it with zlib, and append it into a pack file.  Do that
for every file and toss the SHA1 values, file names and revision
numbers off into a table somewhere.

Then loop back through and generate trees while playing around only
with the RCS file paths, timestamps and SHA1 pointers.  Again tree
generation is extremely simple; it would be trivial to generate
tree objects and append them into the same (or another) pack.

Finally writing commit objects pointing at the trees is also easy,
without calling git-commit.

When you are all done run a `git-repack -a -d -f` and let the delta
code compress everything down.  That first compression might take
a little while but it should do a reasonably good job despite the
input pack(s) being highly unorganized.

So I think I'm suggesting you find a way to generate the base objects
yourself right into a pack file, rather than using the higher level
GIT executables to do it.  You may be able to reuse some of the
code in GIT but I know its writer code is organized for writing
loose objects, not for appending new objects into a new pack file,
so some surgery would probably be required.

-- 
Shawn.
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]