On Thu, 2 Sep 2010, Luke Kenneth Casson Leighton wrote: > nicolas, thanks for responding: you'll see this some time in the > future when you catch up, it's not a high priority, nothing new, just > thinking out loud, for benefit of archives. Well, I might as well pay more attention to *you* now. :-) > >> * is it possible to _make_ the repository guaranteed to produce > >> identical pack objects? > > > > Sure, but performance will suck. > > that's fiiine :) as i've learned on the pyjamas project, it's rare > that you have speed and interoperability at the same time... Well, did you hear about this thing called Git? It appears that those Git developers are performance freaks. :-) Yet, Git is interoperable across almost all versions ever released because we made sure that only fundamental things are defined and relied upon. And that excludes actual delta pairing and pack object ordering. That's why a pack file may have many different byte sequences and yet still represent the same canonical data. > if the pack-objects are going to vary, then the VFS layer idea is > blown completely out the water, except for the absolute basic > meta-info such as "refs/heads/*". so i might as well just use > "actual" bittorrent to transfer packs via > {ref}-{commitref}-{SHA-1}.torrent. For the archive benefit, here's what I think on the whole idea. The BitTorrent model is simply unappropriate for Git. It doesn't fit to the Git model at all as BitTorrent works on stable and static data, and requires a lot of people wanting that same data. When you perform a fetch, Git does actually negociate with the server to figure out what's missing locally, and the server does produce a custom pack for you that is optimized so that only what's needed for you to be up to date is transferred. Even if you try to cache a set of packs to suit the BitTorrent static data model, you'll need so many packs to cover all the possible gaps between a server and a random number of clients each with a random repository state. Of course it is possible to have bigger packs covering larger gaps, but then you lose the biggest advantage that the smart Git protocol has. And with smaller, more fine grained packs, you'll end up with so many of them that finding a live torrent for the actual one you need is going to be difficult. > ho hum, drawing board we come... Yep. Instead of transferring packs, a BitTorrent-alike transfer should be based on the transfer of _objects_. Therefore you can make the correspondance between file chunks in BitTorrent with objects in a Git aware system. So, when contacting a peer, you could negociate what is the set of objects that the peer has that you don't, and vice versa. Objects in Git are stable and immutable, and they all have a unique SHA1 signature. And to optimize the negociation, the pack index content can be used, first by exchanging the content of the first level fan-out table and ignoring those entries that are equal. This for each peer. Then, each peer make requests to connected peers for objects that those peers have but that isn't available locally, just like chunks in BitTorrent. But here's the twist to make this scale well. Since the object sender knows what objects the receiver already has, it therefore can choose the object encoding. Meaning that the sender can simply *reuse* a delta encoding for an object it is requested to send if the requestor already has the base object for this delta. So in most cases, the object to send will be small, especially if it is a delta object. That should fit the chunk model. But if an object is bigger than a certain treshold, then its transfer could be chunked across multiple peers just like classic BitTorrent. In this case, the chunking would need to be done on the non delta uncompressed object data as this is the only thing that is universally stable (doesn't mean that the _transfer_ of those chunks can't be compressed). Now this design has many open questions, such as finding out what is the latest set of refs amongst all the peers, whether or not what we have locally are ancestors of the remote refs, etc. And of course, while this will make for a speedy object transfer, the resulting mess on the receiver's end will have to be validated and repacked in the end. So overall this might not end up being faster overall for the fetcher. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html