Re: SHA1 collisions found

ankostis <ankostis@xxxxxxxxx> · Sat, 25 Feb 2017 01:31:32 +0100

On 24 February 2017 at 21:32, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> ankostis <ankostis@xxxxxxxxx> writes:
>
>> Let's assume that git is retroffited to always support the "default"
>> SHA-3, but support additionally more hash-funcs.
>> If in the future SHA-3 also gets defeated, it would be highly unlikely
>> that the same math would also break e.g. Blake.
>> So certain high-profile repos might choose for extra security 2 or more hashes.
>
> I think you are conflating two unrelated things.

I believe the two distinct things you refer to below are these:

  a. storing objects in filesystem and accessing them
     by name (e.g. from cmdline), and

  b. cross-referencing inside the objects (trees, tags, notes),

correct?

If not, then please ignore my answers, below.

>  * How are these "2 or more hashes" actually used?  Are you going to
>    add three "parent " line to a commit with just one parent, each
>    line storing the different hashes?

Yes, in all places where references are involved (tags, notes).
Based on what what the git-hackers have written so far, this might be doable.

To ensure integrity in the case of crypto-failures, all objects must
cross-reference each other with multiple hashes.
Of course this extra security would stop as soon as you reach "old"
history (unless you re-write it).

>    How will such a commit object
>    be named---does it have three names and do you plan to have three
>    copies of .git/refs/heads/master somehow, each of which have
>    SHA-1, SHA-3 and Blake, and let any one hash to identify the
>    object?

Yes, based on Jason Cooper's idea, above, objects would be stored
under all names in the filesystem using hard links (although this
might not work nice on Windows).

>    I suspect you are not going to do so; instead, you would use a
>    very long string that is a concatenation of these three hashes as
>    if it is an output from a single hash function that produces a
>    long result.
>
>    So I think the most natural way to do the "2 or more for extra
>    security" is to allow us to use a very long hash.  It does not
>    help to allow an object to be referred to with any of these 2 or
>    more hashes at the same time.

If hard-linking all names is doable, then most restrictions above are
gone, correct?

>  * If employing 2 or more hashes by combining into one may enhance
>    the security, that is wonderful.  But we want to discourage
>    people from inventing their own combinations left and right and
>    end up fragmenting the world.  If a project that begins with
>    SHA-1 only naming is forked to two (or more) and each fork uses
>    different hashes, merging them back will become harder than
>    necessary unless you support all these hashes forks used.

Agree on discouraging people's inventions.

That is why I believe that some HASH (e.g. SHA-3) must be the blessed one.
All git >= 3.x.x must support at least this one (for naming and
cross-referencing between objects).

> Having said all that, the way to figure out the hash used in the way
> we spell the object name may not be the best place to discourage
> people from using random hashes of their choice.  But I think we
> want to avoid doing something that would actively encourage
> fragmentation.

I guess the "blessed SHA-3 will discourage people using the other
names., untill the next crypto-crack.