On Fri, 4 Aug 2006, Jon Smirl wrote: > > I am converting all of the revisions from each CVS file into git > objects the first time the file is parsed. The plan was to run repack > after each file is finished. That way it should be easy to figure out > the deltas since everything will be a variation on the same file. Sure. In that case, just list the object ID's in the exact same order you created them. Basically,as you create them, just keep a list of all ID's you've created, and every (say) 50,000 objects, just do a echo all objects you've created | git-pack-objects new-pack and then move the new pack into place, and remove all the loose objects (don't even bother using "git prune" - just basically do something like "rm -rf .git/objects/??" to get rid of them). > So what's the best way to pack these objects, append them to the > existing pack and then clean everything up for the next file? I am > parsing 120K CVS files containing over 1M revs. You'll want to repack every once in a while just to not ever have _tons_ of those loose objects around, but if you do it every 50,000 objects, you'll have just twenty nice pack-files once you're done, containing all one million objects, and you'll never have had more than ~200 files in any of the loose object subdirectories. Of course, you might want to make that "every 50,000 object" thing tunable, so that if you don't have a lot of memory for caching, you might want to do it a bit more often just to make each repack go faster and not have tons of IO. You can then do a _full_ repack to get one big object, by just listing every object you ever created (in creation order) to git-pack-objects, and then you can replace all the twenty (smaller) pack-files with the resulting single bigger one. In fact, at that point you no longer even need to worry about "creation order", since you've basically created all the deltas in the first phase, and regardless of ordering, when you then repack everything at the end, it will re-use all earlier delta information. Linus - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html