Re: RFC v3: Another proposed hash function transition plan

Jonathan Nieder <jrnieder@xxxxxxxxx> · Wed, 13 Sep 2017 15:18:54 -0700

Hi,

Stefan Beller wrote:
> On Wed, Sep 13, 2017 at 2:52 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:

>> This is a tangent, but it may be fine for a shallow clone to treat
>> the cut-off points in the history as if they are root commits and
>> compute generation numbers locally, just like everybody else does.
[...]
> Locally it helps for some operations such as correct walks.
> For the network case however, it doesn't really help either.
>
> If we had global generation numbers, one could imagine that they
> are used in the pack negotiation (server advertises the maximum
> generation number or even gen number per branch; client
> could binary search in there for the fork point)
>
> I wonder if locally generated generation numbers (for the shallow
> case) could be used somehow to still improve network operations.

I have a different concern about locally generated generation numbers in
a shallow clone.  My concern is that it is slow to recompute them when
deepening the shallow clone.

However:

 1. That only affects performance and for some use cases could be
    mitigated e.g. by introducing some laziness, and, more
    convincingly,

 2. With a small protocol change, the server could communicate the
    generation numbers for commit objects at the edge of a shallow
    clone, avoiding this trouble.

So I am not too concerned.

More generally, unless there is a very very compelling reason to, I
don't want to couple other changes into the hash function transition.
If they're worthwhile enough to do, they're worthwhile enough to do
whether we're transitioning to a new hash function or not: I have not
heard a convincing example yet of a "while at it" that is worth the
complexity of such coupling.

(That said, if two format changes are worth doing and happen to be
implemented at the same time, then we can save users the trouble of
experiencing two format change transitions.  That is a kind of
coupling from the end user's point of view.  But from the perspective
of someone writing the code, there is no need to count on that, and it
is not likely to happen anyway.)

> If we'd get the transition somewhat right, the next transition will
> be easier than the current transition, such that I am not that concerned
> about longevity. I am rather concerned about the complexity that is added
> to the code base (whilst accumulating technical debt instead of clearer
> abstraction layers)

During the transition, users have to suffer reencoding overhead, so it
is not good for such transitions to need to happen very often.  If the
new hash function breaks early, then we have to cope with it and as
you say, having the framework in place means we'd be ready for that.
But I still don't want the chosen hash function to break early.

In other words, a long lifetime for the hash absolutely is a design
goal.  Coping well with an unexpectedly short lifetime for the hash is
also a design goal.

If the hash function lasts 10 years then I am happy.

Thanks,
Jonathan