On 8/4/06, A Large Angry SCM <gitzilla@xxxxxxxxx> wrote:
Jon Smirl wrote: > On 8/4/06, Linus Torvalds <torvalds@xxxxxxxx> wrote: >> I'd suggest against it, but you can (and should) just repack often enough >> that you shouldn't ever have gigabytes of objects "in flight". I'd have >> expected that with a repack every few ten thousand files, and most files >> being on the order of a few kB, you'd have been more than ok, but >> especially if you have large files, you may want to make things "every >> <n> >> bytes" rather than "every <n> files". > > How about forking off a pack-objects and handing it one file name at a > time over a pipe. When I hand it the next file name I delete the first > file. Does pack-objects make multiple passes over the files? This > model would let me hand it all 1M files. > Why don't you just write the pack file directly? Pack files without deltas have a very simple structure, and git-index-pack will create a pack index file for the pack file you give it.
That is under consideration but the undeltafied pack is about 12GB and it takes forever (about a day) to deltafy it. I'm not convinced yet that an undeltafied pack is any faster than just having the objects in the directories. The same data in a deltafied pack is 700MB. That is a tremendous difference in the amount of IO needed. The strategy has to be to avoid IO, nothing I am doing is ever CPU bound. -- Jon Smirl jonsmirl@xxxxxxxxx - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html