On Thu, May 10, 2012 at 04:43:26PM -0500, Neal Kreitzinger wrote: > >Yes. The on-the-wire format is a packfile. We create a new packfile on > >the fly, so we may find new deltas (e.g., between objects that were > >stored on disk in two different packs), but we will mostly be reusing > >deltas from the existing packs. > > > >So any time you improve the on-disk representation, you are also > >improving the network bandwidth utilization. > > > The git-clone manpage says you can use the rsync protocol for the > url. If you use rsync:// as your url for your remote does that get > you the rsync delta-transfer algorithm efficiency for the network > bandwidth utilization part (as opposed to the on-disk representation > part)? (I'm new to rsync.) Well, yes. If you use the rsync transport, it literally runs rsync, which will use the regular rsync algorithm. But it won't be better than the git protocol (and in fact will be much worse) for a few reasons: 1. The object db files are all named after the sha1 of their content (the object sha1 for loose objects, and the sha1 of the whole pack for packfiles). Rsync will not run its comparison algorithm between files with different names. It will not re-transfer existing loose objects, but it will delete obsolete packfiles and retransfer new ones in their entirety. So it's like re-cloning over again for any fetch after an upstream repack. 2. Even if you could use the rsync delta algorithm, it will never be as efficient as git. Git understands the structure of the packfile and can tell the other side "Hey, I have these objects". Whereas rsync must guess from the bytes in the packfiles. Which is much less efficient to compute, and can be wrong if the representation has changed (e.g., something used to be a whole object, but is now stored as a delta). 3. Even if you could get the exact right set of objects to transfer, and then use the rsync delta algorithm on them, git would still do better. Git's job is much easier: one side has both sets of objects (those to be sent and those not), and is generating and sending efficient deltas for the other side to apply to their objects. Rsync assumes a harder job: you have one set, and the remote side has the other set, and you must agree on a delta by comparing checksums. So it will fundamentally never do as well. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html