Re: Starting to think about sha-256?

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Mon, 28 Aug 2006 00:02:39 +0200 (CEST)

Hi,

On Sun, 27 Aug 2006, Linus Torvalds wrote:

> On Sun, 27 Aug 2006, Krzysztof Halasa wrote:
> > 
> > > Maybe sha-256 could be considered for the next major-rev of git?
> > 
> > Not sure, but _if_ we want it we should do it sooner rather than
> > later.
> 
> Modifying git-convert-objects.c to rewrite the regular sha1 into a sha256 
> should be fairly straightforward. It's never been used since the early 
> days (and has limits like a maximum of a million objects etc that can need 
> fixing), but it shouldn't be "fundamentally hard" per se.

But what about signed tags? (This issue has come up before, but never has 
been adressed.)

I also thought about supporting hybrid hashes, i.e. that older objects 
still can be hashed with SHA-1. Alas, a simple thought experiment 
demonstrates how silly that idea is: most of the objects will not change 
between two revisions, and they'd have to be rehashed with SHA-256 (or 
whatever we decide upon) anyway, so hybrids would do no good.

A better idea would be to increment the repository version, and expect 
SHA-1 for version 1, SHA-256 for version >= 2.

However, I could imagine that we do not need this huge change (it would 
break _many_ setups). The breakthrough was announced last Tuesday, and it 
involved 75% payload, i.e. to fake a new -- say -- git.c, one would need 
to enlarge git.c by a factor 4, and you would see a lot of gibberish 
inside some comment. (Note that I did not listen to the talk myself, this 
is all deducted from the scarce information which is available via the 
'net.)

Even if the breakthrough really comes to full SHA-1, you still have to add 
_at least_ 20 bytes of gibberish. Which would be harder to spot, but it 
would be spotted.

This made me think about the use of hashes in git. Why do we need a hash 
here (in no particular order):

1) integrity checking,
2) fast lookup,
3) identifying objects (related to (2)),
4) trust.

Except for (4), I do not see why SHA-1 -- even if broken -- should not be 
adequate. It is not like somebody found out that all JPGs tend to have 
similar hashes so that collisions are more likely.

And thinking about trust: The hash is augmented by thinking persons. It is 
not like you blindly trust a person forever. You build up trust, and once 
you were failed, the trust is lost, and very hard to build up again. So, 
you just would try to get all objects again from somebody you still trust, 
and never pull from the loser^H^H^H^H^Huntrusted person again. Ever.

Besides, as has been pointed out several times, a dishonest person could 
try to sneak bad code into your repository _regardless_ of a secure hash.

So: Do we really need a secure hash, or do we need an adequate hash, and 
just happen to use one which was intended as a secure hash, but no longer 
is?

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html