Re: GSoC - Some questions on the idea of

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 10, 2012 at 04:43:26PM -0500, Neal Kreitzinger wrote:

> >Yes. The on-the-wire format is a packfile. We create a new packfile on
> >the fly, so we may find new deltas (e.g., between objects that were
> >stored on disk in two different packs), but we will mostly be reusing
> >deltas from the existing packs.
> >
> >So any time you improve the on-disk representation, you are also
> >improving the network bandwidth utilization.
> >
> The git-clone manpage says you can use the rsync protocol for the
> url.  If you use rsync:// as your url for your remote does that get
> you the rsync delta-transfer algorithm efficiency for the network
> bandwidth utilization part (as opposed to the on-disk representation
> part)?  (I'm new to rsync.)

Well, yes. If you use the rsync transport, it literally runs rsync,
which will use the regular rsync algorithm. But it won't be better than
the git protocol (and in fact will be much worse) for a few reasons:

  1. The object db files are all named after the sha1 of their content
     (the object sha1 for loose objects, and the sha1 of the whole pack
     for packfiles). Rsync will not run its comparison algorithm between
     files with different names. It will not re-transfer existing loose
     objects, but it will delete obsolete packfiles and retransfer new
     ones in their entirety. So it's like re-cloning over again for any
     fetch after an upstream repack.

  2. Even if you could use the rsync delta algorithm, it will never be
     as efficient as git. Git understands the structure of the packfile
     and can tell the other side "Hey, I have these objects". Whereas
     rsync must guess from the bytes in the packfiles. Which is much
     less efficient to compute, and can be wrong if the representation
     has changed (e.g., something used to be a whole object, but is now
     stored as a delta).

  3. Even if you could get the exact right set of objects to transfer,
     and then use the rsync delta algorithm on them, git would still do
     better. Git's job is much easier: one side has both sets of
     objects (those to be sent and those not), and is generating and
     sending efficient deltas for the other side to apply to their
     objects. Rsync assumes a harder job: you have one set, and
     the remote side has the other set, and you must agree on a delta by
     comparing checksums. So it will fundamentally never do as well.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]