Re: git pack/unpack over bittorrent - works!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Luke Kenneth Casson Leighton <luke.leighton@xxxxxxxxx> wrote:
> 
>  * based on what you kindly mentioned about "git repack -f", would a
> (well-written!) patch to git pack-objects to add a
> "--single-thread-only" option be acceptable?

Probably not.  I can't think of a good reason to limit the number
of threads that get used.  We already have pack.threads as a
configuration variable to support controlling this for the system,
but that's about the only thing that really makes sense.
 
>  * would you, or anyone else with enough knowledge of how this stuff
> reaallly works, be willing to put some low-priority back-of-mind
> thought into how to create a "canonical" pack format

We have.  We've even talked about it on the mailing list.  Multiple
times.  Most times about how to support a p2p Git transport.
That whole Gittorrent thing you are ignoring, we put some effort
into coming up with a pack-like format that would be more stable,
at the expense of being larger in total size.

>  questions (not necessarily for nicolas) - can anyone think of any
> good reasons _other_ than for multiple file-sharing to have a
> "canonical" pack-object?

Yes, its called resuming a clone over git://.

Right now if you abort git:// you break the pack stream, and it
cannot be restarted.  If we had a more consistent encoding we may
be able to restart an aborted clone.

But we can't solve it.  Its a _very_ hard problem.

Nico, myself, and a whole lot of other very smart folks who really
understand how Git works today have failed to identify a way to do
this that we actually want to write, include in git, and maintain
long-term.  Sure, anyone can come up with a specification that says
"put this here, that there, break ties this way".  But we don't
want to bind our hands and maintain those rules.
 
> off the top of my head i can think of one: rsync if the transfer is
> interrupted.  if the pack-objects are large - and not guaranteed to be
> the same - then an interrupted rsync transfer would be a bit of a
> waste of bandwidth.  however if the pack-object could always be made
> the same, the partial transfer could carry on.   musing a bit
> further... mmm... i supooose the same thing applies equally to http
> and ftp.  it's a bit lame, i know: can anyone think of any better
> reasons?

We already do with this http:// and ftp:// during fetch or clone.
We try to resume with a byte range request, and validate the SHA-1
trailer on the end of the pack file after download.  If it doesn't
match, we throw the file away and restart the entire thing.

In general pack files don't change that often, so there are fairly
good odds that resuming an aborted clone only a few hours after
it aborted would succeed by simply resuming the file download.
But every week or two (or even nightly!) its common for packs to
be completely rewritten (when the repository owner does `git gc`),
so we really cannot rely on packs being stable long-term.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]