On Sun, 26 Aug 2007, Jon Smirl wrote: > > Changing git-daemon only for the initial clone case also means that > people don't need to change the way they manage packs. I do agree that we might want to do some special-case handling for the initial clone (because it *is* kind of special), but it's not necessarily as easy as just re-using an existing pack. At a minimum, we'd need to have something that knows how to make a single pack out of several packs and some loose objects. That shouldn't be *hard*, but it's certainly nontrivial, especially in the presense of the same objects possibly being available more than once in different packs. [ The "duplicate object" thing does actually happen: even if you use only "git native" protocols, you can get duplicate objects because a file was changed back to an earlier version. The incremental packs you get from push/pull'ing between two repositories try to send the minimal incremental changes, but the keyword here is _try_: they will potentially send objects that the receiver already has, if it's not obvious that the receiver has them from the "commit boundary" cases ] Maybe the client side will handle a pack with duplicate objects perfectly fine, and it's not an issue. Maybe. It might even be likely (I can't think of anything that would obviously break). But at a minimum, it would be something that needs some code on the sending side, and a lot of verification that the end result works ok on the receiving side. And there's actually a deeper problem: the current native protocol guarantees that the objects sent over are only those that are reachable. That matters. It matters for subtle security issues (maybe you are exporting some repository that was rebased, and has objects that you didn't *intend* to make public!), but it also matters for issues like git "alternates" files. If you only ever look at a single repo, you'll never see the alternates issue, but if you're seriously looking at serving git repositories, I don't really see the "single repo" case as being at all the most common or interesting case. And if you look at something like kernel.org, the "alternates" thing is *much* more important than how much memory git-daemon uses! Yes, kernel.org would probably be much happier if git-daemon wasn't such a memory pig occasionally, but on the other hand, the win from using alternates and being able to share 99% of all objects in all the various related kernel repositories is actually likely to be a *bigger* memory win than any git-daemon memory usage, because now the disk caching works a hell of a lot better! So it's not actually clear how the initial clone thing can be optimized on the server side. It's easier to optimize on the *client* side: just do the initial clone with rsync/http (and "git gc" it on the client afterwards), and then change it to the git native protocol after the clone. That may not sound very user-friendly, but let's face it, I think there is exactly one person in the whole universe that tries to use an NSLU2 as a git server. So the "client-side workaround" is likely to affect a very limited number of clients ;) Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html