Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > On 7/31/06, Jakub Narebski <jnareb@xxxxxxxxx> wrote: > >Jon Smirl wrote: > > > >> I'm trying to build a small app that takes a CVS ,v and writes out a > >> pack corresponding to the versions. Suggestions on the most efficient > >> strategy for doing this by calling straight into the git C code? > >> Forking off git commands is not very efficient when done a million > >> times. > > > >Something akin to parsecvs by Keith Packard? > > I see the error in my thoughts now, I need the fully expanded delta to > compute the sha-1 so I might as well use the parsecvs code. > > I am working on combining cvs2svn, parsecvs and cvsps into something > that can handle Mozilla CVS. I think you sort of have the right idea. Creating a pack file from scratch without deltas is a very trivial operation. The pack format is documented in Documentation/technical/pack-format.txt. The actual delta format isn't documented here and generating a delta would be somewhat difficult, but creating a pack with no deltas and only zlib compression is pretty simple. And no, GIT doesn't use the same (horrible) delta format as RCS so you definately are right, you have to expand it before you can compress it. Creating trees and commits from scratch is also really easy. Calling zlib and a SHA1 routine to create the checksum is the hard part. I think I wrote the tree and commit construction part of jgit in a few hours, and that was while I was also being distracted by someone speaking in the front of the room. :-) It should be reasonably simple to extract each revision from a single ,v file into its full undeltafied form, compute its SHA1, compress it with zlib, and append it into a pack file. Do that for every file and toss the SHA1 values, file names and revision numbers off into a table somewhere. Then loop back through and generate trees while playing around only with the RCS file paths, timestamps and SHA1 pointers. Again tree generation is extremely simple; it would be trivial to generate tree objects and append them into the same (or another) pack. Finally writing commit objects pointing at the trees is also easy, without calling git-commit. When you are all done run a `git-repack -a -d -f` and let the delta code compress everything down. That first compression might take a little while but it should do a reasonably good job despite the input pack(s) being highly unorganized. So I think I'm suggesting you find a way to generate the base objects yourself right into a pack file, rather than using the higher level GIT executables to do it. You may be able to reuse some of the code in GIT but I know its writer code is organized for writing loose objects, not for appending new objects into a new pack file, so some surgery would probably be required. -- Shawn. - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html