On Thu, Jan 6, 2011 at 6:28 AM, Maaartin <grajcar1@xxxxxxxxx> wrote: > Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes: > >> I've been analyzing bittorrent protocol and come up with this. The >> last idea about a similar thing [1], gittorrent, was given by Nicolas. >> This keeps close to that idea (i.e the transfer protocol must be around git >> objects, not file chunks) with a bit difference. >> >> The idea is to transfer a chain of objects (trees or blobs), including >> base object and delta chain. Objects are chained in according to >> worktree layout, e.g. all objects of path/to/any/blob will form a >> chain, from a commit tip down to the root commits. Chains can have >> gaps, and don't need to start from commit tip. The transfer is >> resumable because if a delta chain is corrupt at some point, we can >> just request another chain from where it stops. Base object is >> obviously resumable. > > I may be talking nonsense, please bare with me. > > I'm not sure if it works well, since chains defined this way change over time. > I may request commits A and B while declaring to possess commits C and D. One > server may be ahead of A, so should it send me more data or repack the chain so > that the non-requested versions get excluded? At the same time the server may > be missing B and posses only some ancestors of it. Should it send me only a > part of the chain or should I better ask a different server? I'll keep it simple. A chain is defined by one commit head. Such a chain can't change over time. But you can ask for just part of the chain, rev-list syntax can be used here. For example if you already have commits C and D and 10 delta in the chain (linear history for simplicity here), requesting "give me A~10 ^C ^D" should give required commits. > Moreover, in case a directory gets renamed, the content may get transfered > needlessly. This is probably no big problem. Yes, the chain constraint can backfire in these cases. We can mix standard upload-pack/fetch-pack and this if the server can recognize these cases, by cutting commit history into chunks. The dir rename chunks can be fetched with git-fetch. > I haven't read the whole other thread yet, but what about going the other way > round? Use a single commit as a chain, create deltas assuming that all > ancestors are already available. The packs may arrive out of order, so the > decompression may have to wait. The number of commits may be one order of > magnitude larger than the the number of paths (there are currently 2254 paths > and 24235 commits in git.git), so grouping consequent commits into one larger > pack may be useful. The number of commits can increase fast. I'd rather have a small/stable number over time. And commits depend on other commits so you can't verify a commit until you have got all of its parents. That does apply to file, but then this file chain does not interfere other file chains. > The advantage is that the packs stays stable over time, you may create them > using the most aggressive and time-consuming settings and store them forever. > You could create packs for single commits, packs for non-overlapping > consecutive pairs of them, for non-overlapping pairs of pairs, etc. I mean with > commits numbered 0, 1, 2, ... create packs [0,1], [2,3], ..., [0,3], [4,7], > etc. The reason for this is obviously to allow reading groups of commits from > different servers so that they fit together (similar to Buddy memory > allocation). Of course, there are things like branches bringing chaos in this > simple scheme, but I'm sure this can be solved somehow. Pack encoding can change. And packs can contain objects you don't want to share (i.e. hidden from public view). > Another problem is the client requesting commits A and B while declaring to > possess commits C and D. When both C and D are ancestors of either A or B, you > can ignore it (as you assume this while packing, anyway). The other case is > less probable, unless e.g. C is the master and A is a developing branch. > Currently. I've no idea how to optimize this and whether this could be > important. As I said, we can request just part of a chain (from A+B to C+D). git-fetch should be used if the repo is quite uptodate though. It's just more efficient. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html