Re: GTP/0.1 terminology 101: commit reels and references

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Tue, 29 Jul 2008, Sam Vilain wrote:

> On Mon, 2008-07-28 at 14:01 +0200, Johannes Schindelin wrote:
> > >   - the reel has a defined object order (which as I hoped to 
> > >     demonstrate in the test cases, is just a refinement of rev-list 
> > >     --date-order)
> > 
> > Do you mean that the commit reel is a list pointing to bundles that 
> > can be sorted topologically by their contained commits?
> 
> Yes, but it is more defined than that.  There are still ambiguities with 
> topological sort, so the gittorrent spec specified exactly how all ties 
> are broken.  They happen to be a further refinement of --date-order, 
> with respect to the ordering of commits.

But does that not mean that any new ref branching off of an ancient commit 
changes all the pack boundaries?

I'd rather have an intelligent incremental updater, and keep most of the 
existing bundles immutable.  That way, a new ref, or a changed one, can be 
mostly served from peers, not exclusively from the seeders.

> > >   - deltas always point in one direction, to objects "earlier" on 
> > >     the reel, so that slices of the reel sent on the network can be 
> > >     made thin without resulting in unresolvable deltas (which should 
> > >     be possible to do on commit boundaries using rev-list 
> > >     --objects-edge)
> >
> > That is exactly what bundles do.  They are thin, as they assume that a 
> > few "preconditions", i.e. refs, are present.
> 
> Ok.  I think there are also some other trivial differences such as 
> bundles containing refs (which in the context of gittorrent will be 
> useless).

Yeah, I think that bundles themselves are pretty useless in gitorrent.  
But what they _contain_ is pretty much what you need as blocks.

> > >   - the behaviour at the beginning of the reel is precisely defined 
> > >     (although as I said, I think that the decision might be worth 
> > >     revisiting - perhaps getting just the latest reel is a useful 
> > >     'shallow clone')
> > 
> > If you want to allow shallow clones, you must make the bundles 
> > non-thin.  That would be a major bandwidth penalty.
> > 
> > I'd rather not allow shallow clones with Gitorrent.
> 
> By "Shallow" I think I mean a different thing to you.  I mean something 
> akin to just the last pack's worth of commits.

That _is_ a shallow clone.  And that is exactly what I meant.  If you want 
to have all objects of the commits in the same pack, then you are 
basically making fat packs.  Which come with a hefty bandwidth penalty.

That is why I would suggest not allowing shallow clones; if you want to 
allow them, I have to ask myself why bother with a torrent at all...  It 
is not like the shallow clones are large, or that the people fetching them 
will stay around long to seed anything, and the packs would change 
frequently, making the whole torrent business pretty inefficient.

> > > It's the lack of guarantees which is the issue, really.
> > 
> > It should not be too difficult to provide a rev-list option (which is 
> > inherited by git-bundle, then) to pay an extra time to make sure that 
> > the bundle is minimal.
> 
> Ok.  But from the current implementation's perspective, this is not yet 
> needed, we are just using the existing API.

Why make it hard?  We have a lively community with brilliant people, and 
they frequently have fun solving puzzles like this: what is the best 
strategy to make equally sized, rarely (or maybe never?) changing packs 
from a set of given refs.

> Actually what would be useful would be for the thin pack generation to 
> also allow any object to be specified as its input list, not just 
> commits... then we wouldn't have to break blocks on commit boundaries 
> (see http://gittorrent.utsl.gen.nz/rfc.html#org-blocks).

That should be easy, but I think that it would be _even better_ if we ask 
pack-objects to generate several packs from the needed objects.  Ooops.  
That already exists: 

	$ git pack-objects --max-pack-size=<n>

Storing the packs in a second GIT_OBJECT_DIRECTORY that has the 
original as an alternate, together with the --local flag, should help even 
further: You can mark the last pack (which does not reach max-pack-size, 
most likely), remove it and just rerun the packing.

Of course, this needs some thought when large chunks of the object 
database become stale when a long branch was just deleted.  Not a major 
obstacle, though.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux