On 8/24/07, Jakub Narebski <jnareb@xxxxxxxxx> wrote: > There was idea to special case clone (just concatenate the packs, the > receiving side as someone told there can detect pack boundaries; do not > forget to pack loose objects, first), instead of using generic fetch --all > for clone, bnut no code. Code speaks louder than words (although if someone > would provide details of pack boundary detection...) A related concept, initial clone of a repository does the equivalent of repack -a on the repo before transmitting it. Why aren't we saving those results by switching the repo onto the new pack file? Then the next clone that comes along won't have to do anything but send the file. But this logic can be flipped around, if the remote needs any object from the pack file, just send them the whole pack file and let the remote sort it out. Using this logic you can still minimize the IO statistically. When a remote does a fetch you have to pack all of the loose objects. When the loose object pile reaches 20MB or so, the fetch can trigger a repack of the oldest half into a pack that is kept by the tree and replaces those older loose objects. For future fetches simply apply the rule of sending the whole pack if any object is needed. The repack of the 10MB of older objects can be kicked out to another process and copied into the tree when it is finished. At that point the loose objects can be deleted. The git db can tolerate a process copying in a new packfile and deleting the old objects while other processes may be using the database, right? This model shouldn't statistically change the amount of data very much. If you haven't synced your tree in a month a few too many objects may get sent to you. However, it should dramatically reduce the IO load on the server cause by git protocol initial clones. -- Jon Smirl jonsmirl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html