Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes: > I've been analyzing bittorrent protocol and come up with this. The > last idea about a similar thing [1], gittorrent, was given by Nicolas. > This keeps close to that idea (i.e the transfer protocol must be around git > objects, not file chunks) with a bit difference. > > The idea is to transfer a chain of objects (trees or blobs), including > base object and delta chain. Objects are chained in according to > worktree layout, e.g. all objects of path/to/any/blob will form a > chain, from a commit tip down to the root commits. Chains can have > gaps, and don't need to start from commit tip. The transfer is > resumable because if a delta chain is corrupt at some point, we can > just request another chain from where it stops. Base object is > obviously resumable. I may be talking nonsense, please bare with me. I'm not sure if it works well, since chains defined this way change over time. I may request commits A and B while declaring to possess commits C and D. One server may be ahead of A, so should it send me more data or repack the chain so that the non-requested versions get excluded? At the same time the server may be missing B and posses only some ancestors of it. Should it send me only a part of the chain or should I better ask a different server? Moreover, in case a directory gets renamed, the content may get transfered needlessly. This is probably no big problem. I haven't read the whole other thread yet, but what about going the other way round? Use a single commit as a chain, create deltas assuming that all ancestors are already available. The packs may arrive out of order, so the decompression may have to wait. The number of commits may be one order of magnitude larger than the the number of paths (there are currently 2254 paths and 24235 commits in git.git), so grouping consequent commits into one larger pack may be useful. The advantage is that the packs stays stable over time, you may create them using the most aggressive and time-consuming settings and store them forever. You could create packs for single commits, packs for non-overlapping consecutive pairs of them, for non-overlapping pairs of pairs, etc. I mean with commits numbered 0, 1, 2, ... create packs [0,1], [2,3], ..., [0,3], [4,7], etc. The reason for this is obviously to allow reading groups of commits from different servers so that they fit together (similar to Buddy memory allocation). Of course, there are things like branches bringing chaos in this simple scheme, but I'm sure this can be solved somehow. Another problem is the client requesting commits A and B while declaring to possess commits C and D. When both C and D are ancestors of either A or B, you can ignore it (as you assume this while packing, anyway). The other case is less probable, unless e.g. C is the master and A is a developing branch. Currently. I've no idea how to optimize this and whether this could be important. I see no disadvantage when compared to path-based chains, but am probably overlooking something obvious. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html