On Tue, 18 Aug 2009, Tomasz Kontusz wrote: > Ok, so it looks like it's not implementable without some kind of cache > server-side, so the server would know what the pack it was sending > looked like. > But here's my idea: make server send objects in different order (the > newest commit + whatever it points to first, then next one,then > another...). Then it would be possible to look at what we got, tell > server we have nothing, and want [the newest commit that was not > complete]. I know the reason why it is sorted the way it is, but I think > that the way data is stored after clone is clients problem, so the > client should reorganize packs the way it wants. That won't buy you much. You should realize that a pack is made of: 1) Commit objects. Yes they're all put together at the front of the pack, but they roughly are the equivalent of: git log --pretty=raw | gzip | wc -c For the Linux repo as of now that is around 32 MB. 2) Tree andblob objects. Those are the bulk of the content for the top commit. The top commit is usually not delta compressed because we want fast access to the top commit, and that is used as the base for further delta compression for older commits. So the very first commit is whole at the front of the pack right after the commit objects. you can estimate the size of this data with: git archive --format=tar HEAD | gzip | wc -c On the same Linux repo this is currently 75 MB. 3) Delta objects. Those are making the rest of the pack, plus a couple tree/blob objects that were not found in the top commit and are different enough from any object in that top commit not to be represented as deltas. Still, the majority of objects for all the remaining commits are delta objects. So... if we reorder objects, all that we can do is to spread commit objects around so that the objects referenced by one commit are all seen before another commit object is included. That would cut on that initial 32 MB. However you still have to get that 75 MB in order to at least be able to look at _one_ commit. So you've only reduced your critical download size from 107 MB to 75 MB. This is some improvement, of course, but not worth the bother IMHO. If we're to have restartable clone, it has to work for any size. And that's where the real problem is. I don't think having servers to cache pack results for every fetch requests is sensible as that would be an immediate DoS attack vector. And because the object order in a pack is not defined by the protocol, we cannot expect the server to necessarily always provide the same object order either. For example, it is already undefined in which order you'll receive objects as threaded delta search is non deterministic and two identical fetch requests may end up with slightly different packing. Or load balancing may redirect your fetch requests to different git servers which might have different versions of zlib, or even git itself, affecting the object packing order and/or size. Now... What _could_ be done, though, is some extension to the git-archive command. One thing that is well and strictly defined in git is the file path sort order. So given a commit SHA1, you should always get the same files in the same order from git-archive. For an initial clone, git could attempt fetching the top commit using the remote git-archive service and locally reconstruct that top commit that way. if the transfer is interrupted in the middle, then the remote git-archive could be told how to resume the transfer by telling it how many files and how many bytes in the current file to skip. This way the server doesn't need to perform any sort of caching and remains stateless. You then end up with a pretty shallow repository. The clone process could then fall back to the traditional native git transfer protocol to deepen the history of that shallow repository. And then that special packing sort order to distribute commit objects would make sense since each commit would then have a fairly small set of new objects, and most of them would be deltas anyway, making the data size per commit really small and any interrupted transfer much less of an issue. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html