Re: State of NewHash work, future directions, and discussion

Jonathan Nieder <jrnieder@xxxxxxxxx> · Mon, 11 Jun 2018 12:01:03 -0700

Hi,

brian m. carlson wrote:

> Since there's been a lot of questions recently about the state of the
> NewHash work, I thought I'd send out a summary.

Yay!

[...]
> I plan on introducing an array of hash algorithms into struct repository
> (and wrapper macros) which stores, in order, the output hash, and if
> used, the additional input hash.

Interesting.  In principle the four following are separate things:

 1. Hash to be used for command output to the terminal
 2. Hash used in pack files
 3. Additional hashes (beyond (2)) that we can look up using the
    translation table
 4. Additional hashes (beyond (1)) accepted in input from the command
    line and stdin

In principle, (1) and (4) would be globals, and (2) and (3) would be
tied to the repository.  I think this is always what Duy was hinting
at.

All that said, as long as there is some notion of (1) and (4), I'm
excited. :)  Details of how they are laid out in memory are less
important.

[...]
> The transition plan anticipates a stage 1 where accept only SHA-1 on
> input and produce only SHA-1 on output, but store in NewHash.  As I've
> worked with our tests, I've realized such an implementation is not
> entirely possible.  We have various tools that expect to accept invalid
> object IDs, and obviously there's no way to have those continue to work.

Can you give an example?  Do you mean commands like "git mktree"?

[...]
> If you're working on new features and you'd like to implement the best
> possible compatibility with this work, here are some recommendations:

This list is great.  Thanks for it.

[...]
> == Discussion about an Actual NewHash
>
> Since I'll be writing new code, I'll be writing tests for this code.
> However, writing tests for creating and initializing repositories
> requires that I be able to test that objects are being serialized
> correctly, and therefore requires that I actually know what the hash
> algorithm is going to be.  I also can't submit code for multi-hash packs
> when we officially only support one hash algorithm.

Thanks for restarting this discussion as well.

You can always use something like e.g. "doubled SHA-1" as a proof of
concept, but I agree that it's nice to be able to avoid some churn by
using an actual hash function that we're likely to switch to.

Sincerely,
Jonathan