Ted, I think your "canonical pack" idea has value, but I'd be inclined to try to optimize more for the "common case" of developing on a fast local network with many local checkouts, where you occasionally push/fetch external sources via a slow link. Specifically, let's look at the very reasonable scenario of a developer working over a slow DSL or dialup connection. He's probably got many copies of various GIT repositories cloned all over the place (hey, disk is cheap!), but right now he just wants a fresh clean copy of somebody else's new tree with whatever its 3 feature branches are. Furthermore, he's probably even got 80% of the commit objects from that tree archived in his last clone from linux-next. In theory he could very carefully arrange his repositories with judicious use of alternate object directories. From personal experience, though, such arrangements are *VERY* prone to accidentally purging wanted objects; unless you *never* ever delete a branch in the "reference" repository. So I think the real problem to solve would be: Given a collection of local computers each with many local repositories, what is the best way to optimize a clone of a "new" remote repository (over a slow link) by copying most of the data from other local repositories accessible via a fast link? The goal would be to design a P2P protocol capable of rapidly and efficiently building distributed searchable indexes of ordered commits that identify which peer(s) contain that each commit. When you attempt to perform a "git fetch --peer" from a repository, it would quickly connect to a few of the metadata index nodes in the P2P network and use them to negotiate "have"s with the upstream server. The client would then sequentially perform the local "fetch" operations necessary to obtain all the objects it used to minimize the commit range with the server. Once all of those "fetch" operations completed, it could proceed to fetch objects from the server normally. Some amount of design and benchmarking would need to be done in order to figure out the most efficient indexing algorithm for finding a minimal set of "have"s of potentially thousands of refs, many with independent root commits. For example if the index was grouped according to "root commit" (of which there may be more than one), you *should* be able to quickly ask the server about a small list of root commits and then only continue asking about commits whose roots are all known to the server. The actual P2P software would probably involve 2 different daemon processes. The first would communicate with each other and with the repositories, maintaining the ref and commit indexes. These daemons would advertise themselves with Avahi, or alternatively in an enterprise environment they would be managed by your sysadmins and be automatically discovered using DNS-SD. Clients looking to perform a P2P fetch would first ask these. The second daemon would be a modified git-daemon that connects to the advertised "index" daemons and advertises its own refs and commit lists, as well as its IP address and port. My apologies if there are any blatant typos or thinkos, it's a bit later here than I would normally be writing about technical topics. Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html