On 8/4/06, Rogan Dawes <discard@xxxxxxxxxxxx> wrote:
Jon Smirl wrote: > On 8/4/06, Linus Torvalds <torvalds@xxxxxxxx> wrote: >> I'd suggest against it, but you can (and should) just repack often enough >> that you shouldn't ever have gigabytes of objects "in flight". I'd have >> expected that with a repack every few ten thousand files, and most files >> being on the order of a few kB, you'd have been more than ok, but >> especially if you have large files, you may want to make things "every >> <n> >> bytes" rather than "every <n> files". > > How about forking off a pack-objects and handing it one file name at a > time over a pipe. When I hand it the next file name I delete the first > file. Does pack-objects make multiple passes over the files? This > model would let me hand it all 1M files. > I'd imagine that this would not necessarily save you a lot, if you have to write it to disk, and then read it back again. Your only chance here is if you stay in the buffer, and avoid actually writing to disk at all.
If I keep creating files, reading them and then deleting them then it is likely that the same blocks are being used over and over. Since the blocks are reused it will stop the cache thrashing. Some disk writes will still happen but that is way better than doing 12GB of unique writes followed by 12GB of reads. The 24GB of IO is all reads on small files so it is seek time limited since repack does writes in the middle of the reads.
Of course, using a ramdisk/tmpfs for your object directories might be enough to save you. Just use a symlink to tmpfs for the objects directory, and leave the pack files on persistent storage.
The unpacked set of objects is way to big to fit into RAM. Any scheme using the unpacked objects will spill to disk.
That doesn't answer your question about how many passes pack-objects does. Nicholas Pitre should be able to answer that. Rogan
-- Jon Smirl jonsmirl@xxxxxxxxx - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html