Re: Alternates and push

Jan Hudec <bulb@xxxxxx> · Mon, 8 Sep 2008 19:56:06 +0200

On Sun, Sep 07, 2008 at 12:18:02 -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@xxxxxxxxx> writes:
> 
> > Jan Hudec <bulb@xxxxxx> writes:
> > ...
> >> Why is this a *mis*design? Couldn't push be fixed by redesigning it's
> >> protocol along the lines of:
> >>  - clients sends a list of sha1s it wants to push, from the tip down
> >>  - server stops it when it sees an object it has -- this check can be done
> >>    against the object store without having a ref for it.
> >
> > Because your second step is *BROKEN*.
> >
> > Think of a case where an earlier commit walker started fetching into that
> > "server" end, which got newer commits and their associated objects first
> > and then older ones, and then got killed before reaching to the objects it
> > already had.  In such a case, the commit walker will *not* update the refs
> > on the server end (and for a very good reason).
> >
> > After that, the server end would have:
> >
> >  * refs that point at some older commits, all objects from whom are
> 
> s/from whom/reachable from which/;
> 
> >    guaranteed to be in the repository (that's the "ref" guarantee);
> >
> >  * newer commits and their objects, but if you follow them you will hit
> >    some objects that are *NOT* in the repository.
> 
> To visualize, the server object store and refs would be like this:
> 
>     ---o---o---A...x...x...x...x...o---o---X
>                ^ ref
> 
> Commits 'x' are all missing because the commit walker fetched commit X,
> inspected its tree and got the necessary tree and blob objects, went back
> to get X's parent, did the same, then its parent, attempted to do the same
> but got killed before connecting the history fully to A.

The problem was I didn't realize how this could happen. Now when you said
/walker/, it's obvious which way of adding objects does it. I'd however
argue that that, rather than having the object store independent in the first
place, is misdesign.

> If you accepted history on top of X before guaranteeing that you have
> everything reachable from X already in this round of push will give you this:
> 
> 
>     ---o---o---A...x...x...x...x...o---o---X---o---o---Y
>                ^ ref =========== (wrong) ============> ^ ref
> 
> and if you update the ref to point at Y, then you cannot satisfy requests
> from other people who want the history that leads to Y, because somewhere
> between A and X there are commit that you do not even have to begin with.
> 
> So you may even be able accept objects between X..Y, but you cannot update
> the ref from A to Y after accepting such a push, which is pointless.
> 
> You could try a variant of it to unbreak your trick, though.  When you see
> an object that you have, say 'X' above, you traverse down from there until
> reaching some ref (in this case, A) and make sure that you have everything
> in between (not just commits but also associated trees and blobs that are
> needed).  This is quite similar to what is happening when the commit
> walker says "walk deadbeef..." in its progress output.  So it _could_ be
> done, but it would be somewhat expensive.

No. I would vote for unbreaking it at the walker instead. Instead of putting
the downloaded packs directly into the object store, it could put them in
some staging area and only move them in place when all dependencies are
downloaded. Still makes the solution comparably complex to the other ones.

-- 
						 Jan 'Bulb' Hudec <bulb@xxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html