Re: State of NewHash work, future directions, and discussion

Jonathan Nieder <jrnieder@xxxxxxxxx> · Mon, 11 Jun 2018 19:42:52 -0700

brian m. carlson wrote:
> On Mon, Jun 11, 2018 at 12:01:03PM -0700, Jonathan Nieder wrote:

>>  1. Hash to be used for command output to the terminal
>>  2. Hash used in pack files
>>  3. Additional hashes (beyond (2)) that we can look up using the
>>     translation table
>>  4. Additional hashes (beyond (1)) accepted in input from the command
>>     line and stdin
>>
>> In principle, (1) and (4) would be globals, and (2) and (3) would be
>> tied to the repository.  I think this is always what Duy was hinting

Here, by 'always' I meant 'also'.  Sorry for the confusion.

>> at.
>>
>> All that said, as long as there is some notion of (1) and (4), I'm
>> excited. :)  Details of how they are laid out in memory are less
>> important.
>
> I'm happy to hear suggestions on how this should or shouldn't work.  I'm
> seeing these things in my head, but it can be helpful to have feedback
> about what people expect out of the code before I spend a bunch of time
> writing it.

So far you're doing pretty well. :)

I just noticed that I have some copy-edits for the
hash-function-transition doc from last year that I hadn't sent out yet
(oops).  I'll send them tonight or tomorrow morning.

[...]
>> brian m. carlson wrote:

>>> The transition plan anticipates a stage 1 where accept only SHA-1 on
>>> input and produce only SHA-1 on output, but store in NewHash.  As I've
>>> worked with our tests, I've realized such an implementation is not
>>> entirely possible.  We have various tools that expect to accept invalid
>>> object IDs, and obviously there's no way to have those continue to work.
>>
>> Can you give an example?  Do you mean commands like "git mktree"?
>
> I mean situations like git update-index.  We allow the user to insert
> any old invalid value (and in fact check that the user can do this).
> t0000 does this, for example.

I think we can forbid this in the new mode (using a test prereq to
ensure the relevant tests don't get run).  Likewise for the similar
functionality in "git mktree" and "git hash-object -w".

>> You can always use something like e.g. "doubled SHA-1" as a proof of
>> concept, but I agree that it's nice to be able to avoid some churn by
>> using an actual hash function that we're likely to switch to.
>
> I have a hash that I've been using, but redoing the work would be less
> enjoyable.  I'd rather write the tests only once if I can help it.

Thanks for the test fixes so far that make most of the test suite
hash-agnostic!

For t0000, yeah, there's no way around having to hard-code the new
hash there.

Thanks,
Jonathan