Re: 'git replace' and pushing

Cory Fields <FOSS@xxxxxxxxxxxxxxxxxxxxxxxx> · Sat, 27 Nov 2010 12:54:17 -0500

On Sat, Nov 27, 2010 at 2:52 AM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> Cory Fields wrote:
>> On Fri, Nov 26, 2010 at 6:18 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
>>> True, but I suspect the above picture pretty much satisfies Cory's initial
>>> wish, no? ÂYou can fetch recent 4'--5---6 history as if 4' were the root
>>> commit, and if you fetched replacement that tells us to pretend that 4'
>>> has 3 as its parent (and the history leading to 3), you will get a deeper
>>> history.
>>
>> Yes, both of these can be accomplished. I've managed to get that part
>> working, where a default clone pulls in half history, and fetching
>> refs/replace gives you the rest. The only problem is that it requires a
>> filter-branch before pushing.
>
> That's a one-time thing, not per-push, right? ÂA filter-branch would
> indeed be needed to transform the history
>
> Â1 --- 2 --- 3 --- 4 --- 5' --- 6'
>
> into
>
> Â1 --- 2 --- 3 --- 4
> Â4' --- 5 --- 6
>
> and that is unavoidable: the object names encode the entire list of
> ancestors, you cannot push an object without its ancestors, etc.
> But afterwards you can build on the history rooted at 4' and all
> should be well, and you can use checkout --orphan to get a new
> root when the current line of history is about to grow too long.
>
> In other words, the distinction between real history and fake history
> is very relevant. ÂObject transport only cares about the real history
> (barring bugs); if you want to tweak what objects get transferred, you
> really need to rewrite the real history (or use --depth).
>
>> A shallow clone does not fit for us, because we want the default clone to
>> only pull half. ÂHaving a public 1gb repository that will be cloned quite
>> often is bound to make our host unhappy, so we're doing everything we can to
>> get the size down.
>
> Why not publish a "git bundle" of the first 1gb using HTTP,
> BitTorrent, or some other cache-friendly protocol and use a hook to
> reject attempts to fetch too many objects at once from the host?
>
>> Also, maybe I haven't made this clear... the "real" commit IDs need to
>> match the "fake" ones in order to prevent confusion.
>
> Not sure what this means. ÂBut commit IDs are defined based on
> content, and for simplicity and sanity the object transport machinery
> deliberately does not look beyond that.
>
> Regards,
> Jonathan
>

I think a one-time filter-branch is going to be our best bet. I had
assumed that this was the case, I just wanted reassurance that it was
necessary. I have that now. Thanks to all for the responses.

Martin: That sounds very interesting indeed. However, the docs make
shallow clones sound scary. From the docs: "A shallow repository has a
number of limitations (you cannot clone or fetch from it, nor push
from nor into it)"

I suppose these limitations would need to be addressed if/when looking
into serverside depth defaults?

Cory
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html