Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > On 8/5/06, Martin Langhoff <martin.langhoff@xxxxxxxxx> wrote: > >On 8/5/06, Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > >> On 8/4/06, Linus Torvalds <torvalds@xxxxxxxx> wrote: > >> > and you're basically all done. The above would turn each *,v file into > >a > >> > *-<sha>.pack/*-<sha>.idx file pair, so you'd have exactly as many > >> > pack-files as you have *,v files. > >> > >> I'll end up with 110,000 pack files. > > > >Then just do it every 100 files, and you'll only have 1,100 pack > >files, and it'll be fine. > > This is something that has to be tuned. If you wait too long > everything spills out of RAM and you go totally IO bound for days. If > you do it too often you end up with too many packs and it takes a day > to repack them. > > If I had a way to pipe the all of the objects into repack one at a > time without repack doing multiple passes none of this tuning would be > necessary. In this model the standalone objects never get created in > the first place. The fastest IO is IO that has been eliminated. I'm almost done with what I'm calling `git-fast-import`. It takes a stream of blobs on STDIN and writes the pack to a file, printing SHA1s in hex format to STDOUT. The basic format for STDIN is a 4 byte length (native format) followed by that many bytes of blob data. It prints the SHA1 for that blob to STDOUT, then waits for another length. It naively deltas each object against the prior object, thus it would be best to feed it one ,v file at a time working from the most recent revision back to the oldest revision. This works well for an RCS file as that's the natural order to process the file in. :-) When done you close STDIN and it'll rip through and update the pack object count and the trailing checksum. This should let you pack the entire repository in delta format using only two passes over the data: one to write out the pack file and one to compute its checksum. I'll post the code in a couple of hours. -- Shawn. - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html