Re: git pack/unpack over bittorrent - works!

Jakub Narebski <jnareb@xxxxxxxxx> · Sat, 04 Sep 2010 13:20:42 -0700 (PDT)

Ted Ts'o <tytso@xxxxxxx> writes:
> On Sat, Sep 04, 2010 at 03:50:29PM +0100, Luke Kenneth Casson Leighton wrote:
> > 
> > :)  the legality or illegality isn't interesting - or is a... red
> > herring, being one of the unfortunate anarchistic-word-associations
> > with the concept of "file sharing".  the robustness and convenience
> > aspects - to developers not users - is where it gets reaaally
> > interesting.
> 
> I ask the question because I think being clear about what goals might
> be are critically important.  If in fact the goals is to evade
> detection be spreading out responsibility for code which is illegal in
> some jurisdictions (even if they are commonly used and approved of by
> people who aren't spending millions of dollars purchasing
> congresscritters), there are many additional requirements that are
> imposed on such a system.
> 
> If the goal is speeding up git downloads, then we need to be careful
> about exactly what problem we are trying to solve.
> 
> >  i do not know of a single free software development tool - not a
> > single one - which is peer-to-peer distributed.  just... none.  what
> > does that say??  and we have people bitching about how great but
> > non-free skype is.  there seems to be a complete lack of understanding
> > of the benefits of peer-to-peer infrastructure in the free software
> > community as a whole, and a complete lack of interest in the benefits,
> > too...

Luke, you don't have to be peer-to-peer to be decentralized and
distributed.  People from what I understand bitch most about
centralized (and closed) services.

> Maybe it's because the benefits don't exist for many people?  At least
> where I live, my local ISP (Comcast, which is very common internet
> provider in the States) deliberately degrades the transfer of
> peer2peer downloads.  As a result, it doesn't make sense for me to use
> bittorrent to download the latest Ubuntu or Fedora iso image.  It's in
> fact much faster for me to download it from an ftp site or a web site.
> 
> And git is *extremely* efficient about its network usage, since it
> sends compressed deltas --- especially if you already have a base
> responsitory established.  For example, I took a git repository which
> I haven't touched since August 4th --- exactly one month ago --- and
> did a "git fetch" to bring it up to date by downloading from
> git.kernel.org.  How much network traffic was required, after being
> one month behind?  2.8MB of bytes received, 133k of bytes transmitted.

I think the major problem git-p2p wants to solve is if base repository
is *not* established, i.e. the initial fetch / full clone operation.

Note that with --reference argument to git clone, if you have similar
related repository, you don't have to do a full fetch cloning a fork
of repository you already have (e.g. you have Linus repo, and want to
fetch linux-next).

> That's not a lot.  And it's well within the capabilities of even a
> really busy server to handle.  Remember, peer2peer only helps if the
> aggregate network bandwidth of the peers is greater than (a) your
> download pipe, or (b) a central server's upload pipe.  And if we're
> only transmitting 2.8MB, and a git.kernel.org has an aggregate
> connection of over a gigabit per second to the internet --- it's not
> likely that peer2peer would in fact result in a faster download.  Nor
> is it likely that that git updates are likely to be something which
> the kernel.org folks would even notice as a sizeable percentage of
> their usable network bandwidth.  First of all, ISO image files are
> much bigger, and secondly, there are many more users downloading ISO
> files than there are developers downloading git updates, and certainly
> relatively few developers downloading full git repositories (since
> everybody genreally tries really hard to only do this once).

Well, full initial clone of Linux kernel repository (or any other
large project with long history) is quite large.  Also, not all
projects have big upload pipe.

Additional problem is that clone is currrently non-resumable (at all),
so if you have flaky web connection it might be hard to do initial
clone.

One way of solving this problem that (as I have heard some projects
use) is to prepare "initial" bundle; this bundle can be downloaded via
HTTP or FTP resumably, or be shared via ordinary P2P like BitTorrent.

The initial pack could be 'kept' (nt subject to repacking); with some
code it could serve as canonical starting packfile for cloning, I
think.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html