Re: Resumable clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 6, 2016 at 2:59 PM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
> First of all: my main gripe with the discussed approach is that it uses
> bundles. I know, I introduced bundles, but they just seem too klunky and
> too static for the resumable clone feature.

One thing Junio didn't mention in his summary is the use of pack
bitmap [1]. Jeff talked about GitHub specific needs, but I think it
has values even outside of GitHub: if people store some secret refs in
the initial pack (e.g. hidden by ref namespace, or in reflog - imagine
someone committed a password and did a reset --hard then pushed
again), they probably do not want to publish initial pack as-is. With
pack bitmap, we can sort of recreate the "clean initial pack" on the
fly relatively cheaply because all object order is stable in this
particular pack. We go a tiny bit less static with this resume+pack
bitmap combination.

[1] http://thread.gmane.org/gmane.comp.version-control.git/288205/focus=288222

> So I wonder whether it would be possible to come up with a subset of the
> revs with a stable order, with associated thin packs (using prior revs as
> negative revs in the commit range) such that each thin pack weighs roughly
> 1MB (or whatever granularity you desire). My thinking was that it should
> be possible to follow a similar strategy as bisect to come up with said
> list.
>
> The client could then state that it was interrupted at downloading a given
> rev's pack, with a specific offset, and the (thin) pack could be
> regenerated on the fly (or cached), serving only the desired chunk. The
> server would then also automatically know where in the list of
> stable-ordered revs the clone was interrupted and continue with the next
> one.
>
> Oh, and if regenerating the thin pack instead of caching it, we need to
> ensure a stable packing (i.e. no threads!). That is, given a commit range,
> we need to (re-)generate bytewise-identical thin packs.

The bytewise-identical idea is already shot down. But I like the
splitting into multiple thin packs (and re-downloading the whole thin
pack when failed). Multiple thin packs allow resume capability.
Chaining thin packs saves bandwidth. And pack-objects still has
freedom doing anything inside each thin pack. For gigantic repos and
good-enough connections, this could work (even for fetch/pull).

The biggest problem I see is it's hard for rev-list to split thin
packs based on pack size because we do not know that until
pack-objects has consumed all revs and produced the pack.
Approximation based on the number of objects should probably be ok
unless there are very large blobs. But that probably should be
addressed separately by the resurrection of Junio's split-blob series.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]