On Fri, 4 Aug 2006, Jon Smirl wrote: > > How about forking off a pack-objects and handing it one file name at a > time over a pipe. When I hand it the next file name I delete the first > file. Does pack-objects make multiple passes over the files? This > model would let me hand it all 1M files. pack-objects does actually make several (well, two) passes over the objects right now, because it first does all the sorting based on object size/type, and then does the actual deltifying pass. But doing things one file-name at a time would certainly be fine. You can even do it with git-pack-objects running in parallel, ie you can do a for_each_filename() { cvs-generate-objects filename | git-pack-objects filename rm -rf .git/objects/??/ } and then "cvs-generate-objects" should just make sure that it writes the git object _before_ it actually outputs the object name on stdout. And if you do it this way, you won't even have to pass any filenames, since git-pack-objects will only get objects for the same file, and will do the right thing just sorting them by size. So in the above kind of setting, the _only_ thing that cvs-generate-objects needs to do is: for_each_rev(file) { unsigned char sha1[20]; unsigned long len; void *buf; /* unpack the revision into memory */ buf = cvs_unpack_revision(&len); /* Write it out as a git blob file */ write_sha1_file(buf, len, "blob", sha1); /* Free the memory image */ free(buf); /* Tell git-pack-objects the name of the git blob */ printf("%s\n", sha1_to_hex(sha1)); } and you're basically all done. The above would turn each *,v file into a *-<sha>.pack/*-<sha>.idx file pair, so you'd have exactly as many pack-files as you have *,v files. Linus - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html